Deadlock on drop of reqwest::blocking::Response sometimes #746

karlri · 2019-12-19T08:52:37Z

Sometimes, when a reqwest::blocking::Response is dropped, it appears to deadlock and the drop call never returns. On my setup, it appears to work roughly 70% of the time. Tested on latest master code as of right now.

fn main() {
    for _ in 0..10 {
        may_deadlock();
    }
}

fn may_deadlock() {
    let stream_url = "https://live-bauerse-fm.sharp-stream.com/nrj_instreamtest_se_mp3";

    let mut stream = {
        // setup
        let mut builder = reqwest::blocking::Client::builder();
        // note, I used a proxy:
        // builder = builder.proxy(reqwest::Proxy::all("----").unwrap());
        // get the stream
        builder = builder.connect_timeout(std::time::Duration::from_secs(2));
        builder = builder.timeout(std::time::Duration::from_secs(2));
        let client = builder.build().unwrap();
        let builder = client.get(stream_url);
        builder.send().unwrap()
    };

    println!("ok, reading some data!");
    let mut buffer = [0; 1400];
    use std::io::Read;
    stream.read(&mut buffer).unwrap();
    println!("attempting to drop stream. This will deadlock SOMETIMES...");
    std::mem::drop(stream);
    println!("we get here as expected ~70% of the time");
}

ok, reading some data!
attempting to drop stream. This will deadlock SOMETIMES...
we get here as expected ~70% of the time
ok, reading some data!
attempting to drop stream. This will deadlock SOMETIMES...
we get here as expected ~70% of the time
ok, reading some data!
attempting to drop stream. This will deadlock SOMETIMES...
we get here as expected ~70% of the time
ok, reading some data!
attempting to drop stream. This will deadlock SOMETIMES...

then, stuck.

seanmonstar · 2019-12-19T20:13:24Z

Interesting... I noticed the blocking client is using futures-channel, which may still have a race condition when closing (rust-lang/futures-rs#909). Would you be able to check if #748 fixes this for you?

karlri · 2019-12-20T10:17:50Z

It indeed looks like the problem is resolved! Thanks for quick fix!
Edit: Sorry, spoke too soon, see below

karlri · 2019-12-20T10:29:22Z

Actually no, it seems to make the deadlock less likely. Main thread BT:

#0  0x00007f4081a5f6dd in pthread_join (
    threadid=139915023746816, thread_return=0x0)
    at pthread_join.c:90
#1  0x00005569486add5d in std::sys::unix::thread::Thread::join::h0dd4634e5c3886ac ()
    at src/libstd/sys/unix/thread.rs:179
#2  0x00005569482e3f51 in std::thread::JoinInner$LT$T$GT$::join::h1f2d05cc866272cd (
    self=0x7ffd3c600b48)
    at /rustc/4560ea788cb760f0a34127156c78e2552949f734/src/libstd/thread/mod.rs:1327
#3  0x00005569482e3fc2 in std::thread::JoinHandle$LT$T$GT$::join::he18ad55cd4a5ccd1 (self=...)
    at /rustc/4560ea788cb760f0a34127156c78e2552949f734/src/libstd/thread/mod.rs:1460
#4  0x000055694834b5f2 in _$LT$reqwest..blocking..client..InnerClientHandle$u20$as$u20$core..ops..drop..Drop$GT$::drop::_$u7b$$u7b$closure$u7d$$u7d$::h6322640f342aadad (h=...)
    at reqwest/src/blocking/client.rs:581
#5  0x000055694828a6d0 in core::option::Option$LT$T$GT$::map::hd88b77d2e830f5b9 (self=..., f=...)
    at /rustc/4560ea788cb760f0a34127156c78e2552949f734/src/libcore/option.rs:447
#6  0x00005569482e1a50 in _$LT$reqwest..blocking..client..InnerClientHandle$u20$as$u20$core..ops..drop..Drop$GT$::drop::hb6fe9cab38400c64 (
    self=0x55694a62b8e0)
    at reqwest/src/blocking/client.rs:581
#7  0x00005569481c69b5 in core::ptr::real_drop_in_place::hf1dea16a8e08991a ()
    at /rustc/4560ea788cb760f0a34127156c78e2552949f734/src/libcore/ptr/mod.rs:175
#8  0x00005569482a7a14 in core::ptr::drop_in_place::h387838fc20a3f1ac (to_drop=0x55694a62b8e0)
    at /rustc/4560ea788cb760f0a34127156c78e2552949f734/src/libcore/ptr/mod.rs:165
#9  alloc::sync::Arc$LT$T$GT$::drop_slow::hfa5e9608c48264e2 (self=0x7ffd3c6024a8)
    at /rustc/4560ea788cb760f0a34127156c78e2552949f734/src/liballoc/sync.rs:703
#10 0x00005569482a8958 in _$LT$alloc..sync..Arc$LT$T$GT$$u20$as$u20$core..ops..drop..Drop$GT$::drop::hb7e0b751c8637854 (self=0x7ffd3c6024a8)
    at /rustc/4560ea788cb760f0a34127156c78e2552949f734/src/liballoc/sync.rs:1225
#11 0x00005569481a9b3f in core::ptr::real_drop_in_place::h3462ae8feecfdbef ()
    at /rustc/4560ea788cb760f0a34127156c78e2552949f734/src/libcore/ptr/mod.rs:175
#12 0x00005569481aa37b in core::ptr::real_drop_in_place::ha31e0549b0554145 ()
    at /rustc/4560ea788cb760f0a34127156c78e2552949f734/src/libcore/ptr/mod.rs:175
#13 0x00005569481a9e6e in core::ptr::real_drop_in_place::h4fd0c8f05dcadaa2 ()
    at /rustc/4560ea788cb760f0a34127156c78e2552949f734/src/libcore/ptr/mod.rs:175
#14 0x00005569481aa19e in core::ptr::real_drop_in_pl---Type <return> to continue, or q <return> to quit---ace::h8b9a13bff1d36747 ()
    at /rustc/4560ea788cb760f0a34127156c78e2552949f734/src/libcore/ptr/mod.rs:175
#15 0x00005569481a9466 in core::mem::drop::hc9a625d36b0c0444 (_x=...)
    at /rustc/4560ea788cb760f0a34127156c78e2552949f734/src/libcore/mem/mod.rs:706
#16 0x00005569481ac8e3 in reqwest_lockup::may_deadlock::h9b97e9fee10e3133 ()
    at src/main.rs:27
#17 0x00005569481ac40b in reqwest_lockup::main::h5f0d353239e74092 () at src/main.rs:3
#18 0x00005569481ac190 in std::rt::lang_start::_$u7b$$u7b$closure$u7d$$u7d$::hdf09220ccec15447 () at /rustc/4560ea788cb760f0a34127156c78e2552949f734/src/libstd/rt.rs:64
#19 0x00005569486aad83 in std::rt::lang_start_internal::_$u7b$$u7b$closure$u7d$$u7d$::ha04574f12d97cbe2 () at src/libstd/rt.rs:49
#20 std::panicking::try::do_call::h7c2a8488f72db90c () at src/libstd/panicking.rs:292
#21 0x00005569486ae8fa in __rust_maybe_catch_panic () at src/libpanic_unwind/lib.rs:80
#22 0x00005569486ab90d in std::panicking::try::hc3a9b5da4250385d ()
    at src/libstd/panicking.rs:271
#23 std::panic::catch_unwind::hf27600bf8c37809a () at src/libstd/panic.rs:394
#24 std::rt::lang_start_internal::h409d4f2fe51133b0 () at src/libstd/rt.rs:48
#25 0x00005569481ac169 in std::rt::lang_start::h2d8bb13c8f7b0694 (
    main=0x5569481ac390 <reqwest_lockup::main::h5f0d353239e74092>, argc=1, 
    argv=0x7ffd3c602858)
    at /rustc/4560ea788cb760f0a34127156c78e2552949f734/src/libstd/rt.rs:64
#26 0x00005569481aca5a in main ()

it appears to hang at: client.rs line 581 on branch self.thread.take().map(|h| h.join());

other threads:
(gdb) thread 1
[Switching to thread 1 (Thread 0x7f408297d900 (LWP 16842))]
#0  0x00007f4081a5f6dd in pthread_join (threadid=139915023746816, thread_return=0x0)
    at pthread_join.c:90
90      in pthread_join.c
(gdb) thread 2
[Switching to thread 2 (Thread 0x7f40814a0700 (LWP 16844))]
#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
135     ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: No such file or directory.

Shnatsel · 2019-12-31T16:37:46Z

I can reproduce this in v0.10.0 (git commit 35c6ddd). Happens 6% of the time (measured on a sample of 1,000,000 requests).

I've modified the blocking example to make it read the url from the command line and have a shot timeout, compiled it with cargo build --release --features="native-tls,gzip,blocking,cookies" --example blocking and invoked it in a loop like this:

cut -d , -f 3 ../majestic-million-31dec2019.csv | while read url; do target/release/examples/blocking "$url"; done

Majestic million CSV obtained here.

Normally this prints "Done.", the binary exits and the next iteration is immediately invoked, printing "GET ...". In about a minute you can see it print "Done." without following it up with "GET..." and progress halts.

I'm on Ubuntu 18.04, rustc stable 1.40.0

Shnatsel · 2020-01-01T23:36:40Z

A deadlock also happens on Windows, so this bug is probably not platform-specific. Thanks to @DutchGhost for testing.

seanmonstar · 2020-01-02T18:53:26Z

I'm looking into this now. First step is I'm adding logs to try to help narrow down where the hang is in #759. My current test environment only has 1 CPU, so it's hard for me to reproduce races involving threads.

seanmonstar · 2020-01-02T20:38:28Z

After some deep code diving, I have a slight hunch. Are you able to try your test with a list of URLs that have IP addresses for hosts, so that DNS isn't invoked?

seanmonstar · 2020-01-02T20:58:30Z

Also, if possible, it'd be hugely helpful to see the stacks of the other threads. reqwest is hanging waiting for thread.join, so the other thread isn't finishing up for some reason.

Shnatsel · 2020-01-02T21:45:07Z

It is still reproducible using IP addresses only. What's more, I got another, different hang this way!

The first time around the very first IP 172.217.20.46 hung immediately for me, before it even downloaded the content. This might be a different bug?

After that I've done a bunch more runs with IPs and it does hang the same way it did with domain names. I've simply run the first 3000 entries from majestic_million.csv through nslookup to get the IP list.

seanmonstar · 2020-01-02T21:52:45Z

Well shucks. Is it possible to investigate the process when it hangs with gdb and see the stacks of the other threads?

Shnatsel · 2020-01-02T22:03:12Z

I've never learned to use gdb so I'm afraid I cannot provide anything meaningful here. And remote debugging through a human proxy is generally inefficient.

The code to reproduce the issue is linked above. If you need access to a multi-threaded machine, Google Cloud gives $300 free trial credit to everyone. You can just ssh into one of their VMs to debug this. I've already checked that the issue is reproducible there (on 4-core machine specifically).

seanmonstar · 2020-01-06T18:44:02Z

Ok, I managed to reproduce and get a backtrace of the other thread:

#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007ffff70b3bb5 in __GI___pthread_mutex_lock (mutex=0x7ffff00023a0) at ../nptl/pthread_mutex_lock.c:80
#2  0x00005555556d8704 in tokio::runtime::basic_scheduler::SchedulerPriv::schedule::hd36e54f11b45fb84 ()
#3  0x00005555555ea09f in tokio::task::waker::wake_by_val::h24030994c56cf063 ()
#4  0x00005555555e4482 in want::Taker::signal::hff5e4bb4f98d0e72 ()
#5  0x00005555556058c5 in core::ptr::real_drop_in_place::h5d7ab3cea2a38b95 ()
#6  0x0000555555608ee7 in core::ptr::real_drop_in_place::hcdf945c54d6363fa ()
#7  0x000055555560dd04 in tokio::task::core::Core$LT$T$GT$::transition_to_consumed::h97b8b14f387dce59 ()
#8  0x0000555555715aaa in __rust_maybe_catch_panic () at src/libpanic_unwind/lib.rs:78
#9  0x000055555565f172 in tokio::task::harness::Harness$LT$T$C$S$GT$::do_cancel::h823276277b2d0293 ()
#10 0x00005555555d1956 in tokio::task::queue::MpscQueues$LT$S$GT$::close_remote::h039153b4ef2d9c63 ()
#11 0x00005555555d1c7d in tokio::task::queue::MpscQueues$LT$S$GT$::shutdown::ha2ff88c001266a59 ()
#12 0x00005555555f5d3d in _$LT$tokio..runtime..basic_scheduler..BasicScheduler$LT$P$GT$$u20$as$u20$core..ops..drop..Drop$GT$::drop::he8b1ff9afa3edcfc ()
#13 0x0000555555631748 in core::ptr::real_drop_in_place::h091f7f6e24d2b1f4 ()

carllerche · 2020-01-06T19:15:32Z

Looks like it is caused by this: tokio-rs/tokio#2058

seanmonstar · 2020-01-07T00:00:48Z

Should hopefully be fixed in the next version of Tokio (0.2.7).

im-n1 · 2020-01-07T07:22:00Z

I have 2 threads and in each I run a blocking get(). I get deadlock 100% of the time.

This bug as described in seanmonstar/reqwest#746 was fixed by tokio-rs/tokio#2062

carllerche · 2020-01-08T05:20:24Z

@im-n1 The fix should have been released today. Can you try now (make sure you have Tokio 0.2.7+).

im-n1 · 2020-01-08T09:49:47Z

Works with Tokio 0.2.8.

karlri · 2020-01-08T11:15:29Z

Works on Linux both on single core ARM and quadcore x86_64

The issue with the check_site test timing out seems to be related to a similar reqwest issue (seanmonstar/reqwest#746). This was due to an an upstream bug in tokio and should be fixed in tokio 0.2.7 onward.

The issue with the check_site test timing out seems to be related to a similar reqwest issue (seanmonstar/reqwest#746). This was due to an upstream bug in tokio and may be fixed in tokio 0.2.7 onward.

gnzlbg · 2020-01-17T10:12:47Z

parking_lot primitives have deadlock detection that can be enabled with a cargo feature, so making sure that reqwest or tokio panic on a deadlock should be "as simple" as enabling that feature. The panic will output the backtraces of all threads involved in that deadlock.

carllerche · 2020-01-17T15:49:27Z

Thanks for the tip. Unfortunately, usually figuring out which threads are involved in the deadlock is the easy part. The hard part is figuring out why as the actual reason is almost never at the location of the deadlock.

gnzlbg · 2020-01-17T15:57:24Z

Sure, debugging deadlocks is hard.

I just meant to say that exposing such a feature would:

allow users that want to protect themselves against them and are willing to pay the runtime cost to do so (this turns deadlocks into panics, so deadlocks could be used to restart things)
help people fill better bugreports, because one often discovers a deadlock when the application does not terminate, but I at least often have the doubt of whether "is some work still ongoing" or "is this really a deadlock". Being able to turn a flag and reproduce with a backtrace is a huge time saver, and allows filling a more informative bug report. Although as you mention, from there to actually finding out what the problem is, the road is long.

What would be the appropriate place to discuss adding such a feature ?

seanmonstar added the bug This is bug! label Jan 2, 2020

seanmonstar mentioned this issue Jan 4, 2020

Blocking client hangs #765

Closed

kevinkassimo mentioned this issue Jan 5, 2020

Server hangs on close denoland/deno#3593

Closed

seanmonstar added the upstream Blocked: upstream. Depends on a dependency to make a change first. label Jan 6, 2020

seanmonstar mentioned this issue Jan 6, 2020

Fix basic_scheduler deadlock when waking during drop tokio-rs/tokio#2062

Merged

seanmonstar closed this as completed in tokio-rs/tokio#2062 Jan 6, 2020

seanmonstar reopened this Jan 7, 2020

bpetlert added a commit to bpetlert/pacman-mirrorup that referenced this issue Jan 7, 2020

fix: deadlock on drop of reqwest::blocking::Response

0f29317

This bug as described in seanmonstar/reqwest#746 was fixed by tokio-rs/tokio#2062

karlri closed this as completed Jan 8, 2020

samford mentioned this issue Jan 8, 2020

Update tokio and re-enable check_site test getzola/zola#915

Merged

beanpuppy mentioned this issue Apr 14, 2020

Changing http libraries? halcyonnouveau/roux#10

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deadlock on drop of reqwest::blocking::Response sometimes #746

Deadlock on drop of reqwest::blocking::Response sometimes #746

karlri commented Dec 19, 2019 •

edited

seanmonstar commented Dec 19, 2019

karlri commented Dec 20, 2019 •

edited

karlri commented Dec 20, 2019

Shnatsel commented Dec 31, 2019

Shnatsel commented Jan 1, 2020

seanmonstar commented Jan 2, 2020

seanmonstar commented Jan 2, 2020

seanmonstar commented Jan 2, 2020

Shnatsel commented Jan 2, 2020

seanmonstar commented Jan 2, 2020

Shnatsel commented Jan 2, 2020 •

edited

seanmonstar commented Jan 6, 2020

carllerche commented Jan 6, 2020

seanmonstar commented Jan 7, 2020

im-n1 commented Jan 7, 2020 •

edited

carllerche commented Jan 8, 2020

im-n1 commented Jan 8, 2020

karlri commented Jan 8, 2020

gnzlbg commented Jan 17, 2020

carllerche commented Jan 17, 2020

gnzlbg commented Jan 17, 2020

Deadlock on drop of reqwest::blocking::Response sometimes #746

Deadlock on drop of reqwest::blocking::Response sometimes #746

Comments

karlri commented Dec 19, 2019 • edited

seanmonstar commented Dec 19, 2019

karlri commented Dec 20, 2019 • edited

karlri commented Dec 20, 2019

Shnatsel commented Dec 31, 2019

Shnatsel commented Jan 1, 2020

seanmonstar commented Jan 2, 2020

seanmonstar commented Jan 2, 2020

seanmonstar commented Jan 2, 2020

Shnatsel commented Jan 2, 2020

seanmonstar commented Jan 2, 2020

Shnatsel commented Jan 2, 2020 • edited

seanmonstar commented Jan 6, 2020

carllerche commented Jan 6, 2020

seanmonstar commented Jan 7, 2020

im-n1 commented Jan 7, 2020 • edited

carllerche commented Jan 8, 2020

im-n1 commented Jan 8, 2020

karlri commented Jan 8, 2020

gnzlbg commented Jan 17, 2020

carllerche commented Jan 17, 2020

gnzlbg commented Jan 17, 2020

karlri commented Dec 19, 2019 •

edited

karlri commented Dec 20, 2019 •

edited

Shnatsel commented Jan 2, 2020 •

edited

im-n1 commented Jan 7, 2020 •

edited