New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deadlock on drop of reqwest::blocking::Response sometimes #746
Comments
Interesting... I noticed the blocking client is using futures-channel, which may still have a race condition when closing (rust-lang/futures-rs#909). Would you be able to check if #748 fixes this for you? |
It indeed looks like the problem is resolved! Thanks for quick fix! |
Actually no, it seems to make the deadlock less likely. Main thread BT:
|
I can reproduce this in v0.10.0 (git commit 35c6ddd). Happens 6% of the time (measured on a sample of 1,000,000 requests). I've modified the blocking example to make it read the url from the command line and have a shot timeout, compiled it with
Majestic million CSV obtained here. Normally this prints "Done.", the binary exits and the next iteration is immediately invoked, printing "GET ...". In about a minute you can see it print "Done." without following it up with "GET..." and progress halts. I'm on Ubuntu 18.04, rustc stable 1.40.0 |
A deadlock also happens on Windows, so this bug is probably not platform-specific. Thanks to @DutchGhost for testing. |
I'm looking into this now. First step is I'm adding logs to try to help narrow down where the hang is in #759. My current test environment only has 1 CPU, so it's hard for me to reproduce races involving threads. |
After some deep code diving, I have a slight hunch. Are you able to try your test with a list of URLs that have IP addresses for hosts, so that DNS isn't invoked? |
Also, if possible, it'd be hugely helpful to see the stacks of the other threads. reqwest is hanging waiting for |
It is still reproducible using IP addresses only. What's more, I got another, different hang this way! The first time around the very first IP 172.217.20.46 hung immediately for me, before it even downloaded the content. This might be a different bug? After that I've done a bunch more runs with IPs and it does hang the same way it did with domain names. I've simply run the first 3000 entries from majestic_million.csv through |
Well shucks. Is it possible to investigate the process when it hangs with |
I've never learned to use gdb so I'm afraid I cannot provide anything meaningful here. And remote debugging through a human proxy is generally inefficient. The code to reproduce the issue is linked above. If you need access to a multi-threaded machine, Google Cloud gives $300 free trial credit to everyone. You can just ssh into one of their VMs to debug this. I've already checked that the issue is reproducible there (on 4-core machine specifically). |
Ok, I managed to reproduce and get a backtrace of the other thread:
|
Looks like it is caused by this: tokio-rs/tokio#2058 |
Should hopefully be fixed in the next version of Tokio (0.2.7). |
I have 2 threads and in each I run a blocking |
This bug as described in seanmonstar/reqwest#746 was fixed by tokio-rs/tokio#2062
@im-n1 The fix should have been released today. Can you try now (make sure you have Tokio 0.2.7+). |
Works with Tokio 0.2.8. |
Works on Linux both on single core ARM and quadcore x86_64 |
The issue with the check_site test timing out seems to be related to a similar reqwest issue (seanmonstar/reqwest#746). This was due to an an upstream bug in tokio and should be fixed in tokio 0.2.7 onward.
The issue with the check_site test timing out seems to be related to a similar reqwest issue (seanmonstar/reqwest#746). This was due to an upstream bug in tokio and may be fixed in tokio 0.2.7 onward.
|
Thanks for the tip. Unfortunately, usually figuring out which threads are involved in the deadlock is the easy part. The hard part is figuring out why as the actual reason is almost never at the location of the deadlock. |
Sure, debugging deadlocks is hard. I just meant to say that exposing such a feature would:
What would be the appropriate place to discuss adding such a feature ? |
Sometimes, when a reqwest::blocking::Response is dropped, it appears to deadlock and the drop call never returns. On my setup, it appears to work roughly 70% of the time. Tested on latest master code as of right now.
ok, reading some data!
attempting to drop stream. This will deadlock SOMETIMES...
we get here as expected ~70% of the time
ok, reading some data!
attempting to drop stream. This will deadlock SOMETIMES...
we get here as expected ~70% of the time
ok, reading some data!
attempting to drop stream. This will deadlock SOMETIMES...
we get here as expected ~70% of the time
ok, reading some data!
attempting to drop stream. This will deadlock SOMETIMES...
then, stuck.
The text was updated successfully, but these errors were encountered: