Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recursive Schema.execute call blocks forever with dataloader #4235

Closed
tchak opened this issue Oct 23, 2022 · 10 comments
Closed

Recursive Schema.execute call blocks forever with dataloader #4235

tchak opened this issue Oct 23, 2022 · 10 comments

Comments

@tchak
Copy link

tchak commented Oct 23, 2022

Describe the bug

We are in the process of updating graphql gem and migrating to use dataloader (nonblocking: false). We run into an issue that comes up due to a bit of weirdness in our application. Some of our mutations during their execution call Schema.execute (we use graphql queries as serialization for objects on our audit logs). What happens is that calls to dataloader during this execution inside mutations block forever. Not really knowing what I am doing, I tried to pass a new dataloader instance to the internal execute call context. It solved the blocking forever problem but now I am getting an empty data hash from the internal execute with no errors. I can provide more details if needed, but maybe it's absolutely terrible and unreasonable what we are doing ?

Versions

graphql version: 1.13.6 and 2
rails: 6.1.6.1

@rmosolgo
Copy link
Owner

Hi, sorry for the trouble! I would like to support re-entry, but I never actually considered it before. I'll take a look soon -- the best approach I can think of is to add a test for this scenario and start tweaking things until it works again...

@rmosolgo
Copy link
Owner

I tried adding a basic test for this in 3c35bc8 and it worked fine with a blocking dataloader, with Ruby-based scheduler and with Libev::Scheduler. However, with Evt::Scheduler, the re-entry test worked, but the thread-based parallelism test failed.

That made me wonder, what are you using for Fiber.scheduler?

@tchak
Copy link
Author

tchak commented Oct 27, 2022

Now, I am wondering, maybe I am doing something wrong. We are NOT using a scheduler and we DON'T have option nonblocking set to true. Maybe I misunderstood the documentation and it's not supposed to work?

@rmosolgo
Copy link
Owner

Derp, sorry -- I read nonblocking: false as nonblocking: true in your description above 🤦

The test I added also covers nonblocking: false, so it must be something a little more complicated than a plain recursive GraphQL call.

Could you try debugging with rbspy? You could get your development server stuck, then use rbspy snapshot --pid ... > backtrace.txt to figure out what the process is doing (or waiting for), then share the stack trace here. That would give us a clue what's going on when the application gets stuck.

For another example of how that's done, I'm debugging a similar issue with Rspec right now where I used rbspy snapshot to debug: rspec/rspec-support#552

@tchak
Copy link
Author

tchak commented Oct 28, 2022

Cool. Thanks for the tips – I will try that next week.

@tchak
Copy link
Author

tchak commented Jan 24, 2023

I am not getting much from rbspy :

block in spawn_thread - /Users/USERNAME/.rbenv/versions/3.1.2/lib/ruby/gems/3.1.0/gems/activerecord-6.1.7.1/lib/active_record/connection_adapters/abstract/connection_pool.rb:348
sleep [c function] - (unknown):0

We will upgrade to rails 7 soon, maybe it will help?

@tchak
Copy link
Author

tchak commented Jun 6, 2023

After upgrading to rails 7 and latest ruby graphql (2.0.22) I am getting an error instead of infinite lock. This is still without nonblocking option.

deadlock; lock already owned by another fiber belonging to the same thread\n" +
   "    /Users/x/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/activesupport-7.0.4.3/lib/active_support/concurrency/load_interlock_aware_monitor.rb:17:in `enter'\n" +
   "    /Users/x/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/activesupport-7.0.4.3/lib/active_support/concurrency/load_interlock_aware_monitor.rb:17:in `block in mon_enter'\n" +
   "    /Users/x/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/activesupport-7.0.4.3/lib/active_support/concurrency/share_lock.rb:187:in `yield_shares'\n" +
   "    /Users/x/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/activesupport-7.0.4.3/lib/active_support/dependencies/interlock.rb:41:in `permit_concurrent_loads'\n" +
   "    /Users/x/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/activesupport-7.0.4.3/lib/active_support/concurrency/load_interlock_aware_monitor.rb:17:in `mon_enter'\n" +
   "    /Users/x/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/activesupport-7.0.4.3/lib/active_support/concurrency/load_interlock_aware_monitor.rb:22:in `block in synchronize'\n" +
   "    /Users/x/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/activesupport-7.0.4.3/lib/active_support/concurrency/load_interlock_aware_monitor.rb:21:in `handle_interrupt'\n" +
   "    /Users/x/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/activesupport-7.0.4.3/lib/active_support/concurrency/load_interlock_aware_monitor.rb:21:in `synchronize'\n" +
   "    /Users/x/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/activerecord-7.0.4.3/lib/active_record/connection_adapters/abstract/query_cache.rb:128:in `cache_sql'\n" +
   "    /Users/x/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/activerecord-7.0.4.3/lib/active_record/connection_adapters/abstract/query_cache.rb:107:in `select_all'\n" +
   "    ... and 37 more lines\n"

@arathunku
Copy link
Contributor

arathunku commented Jun 27, 2023

@tchak did you find any workaround for the problem? We're running into exactly the issue and have a similar case. The situation might get better with https://blog.saeloun.com/2022/02/23/rails-fiber-safe-connection-pools/ but it's not yet released.

EDIT: Looks like we need to wait for Rails v7.1 - rails/rails@ea54939 rails/rails#46553 (looks all related)

@btoews
Copy link

btoews commented Oct 2, 2023

I ran into this issue, or maybe just a similar one. It came down to trying to use an ActiveRecord dataloader within a transaction. The transaction acquires an exclusive lock on the database connection. The dataloader source then runs in a separate fiber and needs to also acquire a lock on the connection to run the query. This second lock deadlocks.

My fix was to avoid using the dataloader within transactions. To catch future instances of this problem without suffering the deadlocks, I am checking the lock in my dataloader source before attempting a query:

  class ActiveRecordSource < GraphQL::Dataloader::Source
    attr_reader :model_class

    def initialize(model_class)
      @model_class = model_class
    end

    def fetch(ids)
      if model_class.connection.lock.mon_locked? && !model_class.connection.lock.mon_owned?
        raise "cannot use dataloader within transaction"
      end

      instances = model_class.where(id: ids).index_by(&:id)
      ids.map { |id| instances[id] }
    end
  end

I think the proper solution for this is to have GraphQL::Dataloader::Source define an #enabled? callback that can be overridden. AR sources can disable the dataloader when a transaction is in progress.

@rmosolgo
Copy link
Owner

Hey, sorry we never got to the bottom of this. @btoews, thanks for sharing what worked for you. Hopefully that will help others who are using Dataloader within a transaction.

Rails 7.1 has config.active_support.isolation_level = :fiber which addressed a lot of issues when using AsyncDataloader. If anyone runs into this problem on Rails 7.1 with that feature enabled, could you please open a new issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants