Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReentrantMutex hangs with fibers, even after 3.11.0 #552

Open
rmosolgo opened this issue Oct 26, 2022 · 7 comments
Open

ReentrantMutex hangs with fibers, even after 3.11.0 #552

rmosolgo opened this issue Oct 26, 2022 · 7 comments

Comments

@rmosolgo
Copy link

Subject of the issue

Hi! I'm following up on a reported Rspec hangup with GraphQL-Ruby's "Dataloader," a fiber-based data batching library (Original bug report: rmosolgo/graphql-ruby#4172). The original report also included a replication: khamusa/gql-dataloader-rspec-mocks-hanging@fd6c05b#diff-cd6746fe941432087bd4ad1ba284ce7e6d3f94ef62e3954e8c04fcc007ba3968R3

I see that there has been recent work on this in #503, etc, but even on 3.11.0, I see this stuck process.

I cloned the replication repo and triggered the reported bug, then inspected the stuck ruby process with rbspy snapshot, and here's what I saw:

Stack trace of stuck ruby process

block in spawn_fiber - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/dataloader.rb:304
block in run - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/dataloader.rb:183
block (3 levels) in coerce_arguments - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/schema/member/has_arguments.rb:249
block in evaluate_selection - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/execution/interpreter/runtime.rb:429
evaluate_selection_with_args - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/execution/interpreter/runtime.rb:530
after_lazy - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/execution/interpreter/runtime.rb:929
block in evaluate_selection_with_args - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/execution/interpreter/runtime.rb:529
call_method_on_directives - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/execution/interpreter/runtime.rb:830
block (2 levels) in evaluate_selection_with_args - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/execution/interpreter/runtime.rb:518
trace - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/tracing.rb:67
block (3 levels) in evaluate_selection_with_args - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/execution/interpreter/runtime.rb:502
resolve - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/schema/field.rb:747
after_lazy - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/schema.rb:1033
block in resolve - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/schema/field.rb:727
with_extensions - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/schema/field.rb:836
block (2 levels) in resolve - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/schema/field.rb:723
public_send [c function] - (unknown):0
resolve_with_support - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/schema/relay_classic_mutation.rb:70
resolve_with_support - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/schema/resolver.rb:116
after_lazy - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/schema.rb:1033
block in resolve_with_support - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/schema/resolver.rb:115
after_lazy - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/schema.rb:1033
block (2 levels) in resolve_with_support - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/schema/resolver.rb:113
after_lazy - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/schema.rb:1033
block (3 levels) in resolve_with_support - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/schema/resolver.rb:112
public_send [c function] - (unknown):0
resolve - /Users/rmosolgo/code/gql-dataloader-rspec-mocks-hanging/spec/debugging_issue_spec.rb:24
block (2 levels) in define_proxy_method - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/rspec-mocks-3.11.1/lib/rspec/mocks/method_double.rb:65
proxy_method_invoked - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/rspec-mocks-3.11.1/lib/rspec/mocks/verifying_proxy.rb:162
proxy_method_invoked - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/rspec-mocks-3.11.1/lib/rspec/mocks/method_double.rb:81
message_received - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/rspec-mocks-3.11.1/lib/rspec/mocks/proxy.rb:367
message_received - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/rspec-mocks-3.11.1/lib/rspec/mocks/proxy.rb:232
invoke - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/rspec-mocks-3.11.1/lib/rspec/mocks/message_expectation.rb:476
invoke_incrementing_actual_calls_by - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/rspec-mocks-3.11.1/lib/rspec/mocks/message_expectation.rb:630
call - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/rspec-mocks-3.11.1/lib/rspec/mocks/message_expectation.rb:765
map [c function] - (unknown):0
block in call - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/rspec-mocks-3.11.1/lib/rspec/mocks/message_expectation.rb:764
block (5 levels) in <top (required)> - /Users/rmosolgo/code/gql-dataloader-rspec-mocks-hanging/spec/debugging_issue_spec.rb:78
block in let - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/rspec-core-3.11.0/lib/rspec/core/memoized_helpers.rb:343
fetch_or_store - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/rspec-core-3.11.0/lib/rspec/core/memoized_helpers.rb:182
fetch [c function] - (unknown):0
block in fetch_or_store - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/rspec-core-3.11.0/lib/rspec/core/memoized_helpers.rb:181
synchronize - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/rspec-support-3.11.0/lib/rspec/support/reentrant_mutex.rb:26
enter - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/rspec-support-3.11.0/lib/rspec/support/reentrant_mutex.rb:36
lock [c function] - (unknown):0

It seems to me like GraphQL-Ruby is executing a query, which happens inside a newly-created fiber, and then that query calls a mocked method (specifically, a method with a block-style mock), and during that block, it calls a let(...)-ed method, and during that call, the process hangs. (The let(...)ed method never returns.)

I expect this test to continue running, but is there another way it should be written? (Or, maybe something is out-of-sorts with GraphQL-Ruby's use of Fibers?)

Your environment

  • Ruby version: 3.0.3
  • rspec-support version: 3.11.0

Steps to reproduce

git clone git@github.com:khamusa/gql-dataloader-rspec-mocks-hanging.git
cd gql-dataloader-rspec-mocks-hanging
git checkout bug-demo
bundle
bundle exec rspec

Expected behavior

The let(...) value should return and the test should continue running

Actual behavior

The test hangs forever, with the rbspy snapshot pointing at:

fetch_or_store - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/rspec-core-3.11.0/lib/rspec/core/memoized_helpers.rb:182
fetch [c function] - (unknown):0
block in fetch_or_store - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/rspec-core-3.11.0/lib/rspec/core/memoized_helpers.rb:181
synchronize - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/rspec-support-3.11.0/lib/rspec/support/reentrant_mutex.rb:26
enter - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/rspec-support-3.11.0/lib/rspec/support/reentrant_mutex.rb:36
lock [c function] - (unknown):0
@JonRowe
Copy link
Member

JonRowe commented Oct 26, 2022

👋 Seeing as you likely have knowledge of whats going on with fibers in the GraphQL package, can you setup a reproduction without the GraphQL library but just an equivalent usage of Fibers? Is Thread.current disrupted within your usage?

@rmosolgo
Copy link
Author

Hey @JonRowe, sure thing, I just worked up a minimal reproduction here: rmosolgo@8a32cff

Interestingly, I found that subject(...) was also required to replicate the problem -- if I moved that call into the test body, it didn't hang!

I don't think there's any messing with Thread.current in the code there.

@mttkay
Copy link

mttkay commented Nov 2, 2022

I don't this is a problem with ReentrantMutex; it is likely a duplicate of this issue with rspec-core: rspec/rspec-core#2064 due how locking semantics changed with Ruby 3.

@mttkay
Copy link

mttkay commented Nov 2, 2022

I actually verified that ReentrantMutex is not the cause in https://gitlab.com/gitlab-org/gitlab/-/merge_requests/102625#note_1156203743 by patching rspec-support to use Ruby's Monitor type, which is also reentrant.

Speaking of which, why does rspect-support not use Monitor to provide reentrancy?

@pirj
Copy link
Member

pirj commented Nov 2, 2022 via email

@mttkay
Copy link

mttkay commented Nov 2, 2022

Ah I see - makes sense 👍

I wonder if this work is necessary at this point since it is being discussed to remove locks around memoized blocks completely in rspec/rspec-core#2064

I would say let's wait for a decision about what should happen to threadsafe since that is the crux of the matter, not the lock implementation.

@JonRowe
Copy link
Member

JonRowe commented Nov 3, 2022

If you research the original development of this you'll find Monitor was actually the source of the original ReentrantMutex definition, with the desired chunks being essentially vendored to remove the dependency on it (we consider anything that requires a require a dependency to be minimized to avoid environment poisoning).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants