New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add exponential backoff to linkcheck #6629
Comments
Reasonable. Could you make a PR please? |
Hi. I’m trying to solving this. The I built a solution based on priority queues detailed below, but I’m not satisfied with it and would like to increase the scope to use PriorityQueueReplace the work The priority is
Each worker thread pulls from the queue. If priority is in the future, requeue the message with the same priority and go to sleep. Issues
Suggested changes
Next steps
If there’s interest in that plan, I’m happy to break the next steps into separate issues and tackle them. Possible extensionTo squeeze out even more performance for |
My large concern is who maintains it. I'm not good at asyncio and aiohttp. So it would be nice if you become a maintainer of the new linkcheck builder. What do you think? Note: We have to care the new one is working fine on Windows too. |
Thanks for the quick feedback. I don’t mind maintaining My day job is as a web developer (mostly Python and Django). I’ve been doing that for about 7 years, I’m pretty familiar with the Web and Python. I’m new to async, but eager to work with it. This change is a good opportunity to grow more familiar with async, and fixing the (hopefully few) issues arising from this change will be a great learning experience. If not being experienced with async beyond a couple personal testing projects is a big concern, I’m okay sticking with the multi-threaded solution, the priority queue and |
Sounds good :-) Let's moving to new architecture! |
Follow the Retry-After header if present, otherwise use an exponential back-off.
Follow the Retry-After header if present, otherwise use an exponential back-off.
Follow the Retry-After header if present, otherwise use an exponential back-off.
Follow the Retry-After header if present, otherwise use an exponential back-off.
Follow the Retry-After header if present, otherwise use an exponential back-off.
Follow the Retry-After header if present, otherwise use an exponential back-off.
Follow the Retry-After header if present, otherwise use an exponential back-off. Allow users to decide the wait time between retries and when to bail out.
Follow the Retry-After header if present, otherwise use an exponential back-off.
Follow the Retry-After header if present, otherwise use an exponential back-off. Close sphinx-doc#7388
Fix #6629: linkcheck: Handle rate-limiting
Is your feature request related to a problem? Please describe.
We link to Github PR/issues for each entry in our changelog, which is included in our documentation. That means that we fire off a hundred or so checks to various pages on Github. Whenever it's not running smoothly, we end up with timeouts, which slows down our development until Github is more reliable again.
Describe the solution you'd like
Allow the linkcheck builder to retry requests a few times (maybe 3, or a configurable amount) with exponential backoff. https://www.peterbe.com/plog/best-practice-with-retries-with-requests gives a quick and easy implementation of how to do this with
requests.get
, which is what linkcheck currently uses to get pages.Describe alternatives you've considered
Another thing that might help would be to collect all the links to check in one step, then sort them such that requests to the same domain happen more sequentially than requests to different domains. I.e., you would do one request to each unique domain first before you try a request to a domain you've already tried before.
The text was updated successfully, but these errors were encountered: