Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection pool exhaustion from delivery thread serializing reports #812

Open
evanlok opened this issue Jan 19, 2024 · 5 comments
Open

Connection pool exhaustion from delivery thread serializing reports #812

evanlok opened this issue Jan 19, 2024 · 5 comments
Labels
backlog We hope to fix this feature/bug in the future needs discussion Requires internal analysis/discussion

Comments

@evanlok
Copy link

evanlok commented Jan 19, 2024

Describe the bug

Bugsnag's delivery thread can checkout an ActiveRecord database connection that is never returned to the pool. This causes subsequent connections to time out waiting for a connection from the pool.

Steps to reproduce

  1. An ActiveJob job receives an ActiveRecord instance as a job argument
  2. The job raises an exception
  3. Bugsnag creates a report that contains the serialized job arguments
  4. The ActiveRecord class overrides to_s which is called when it gets serialized by bugsnag. The to_s method contains code that queries the database.
  5. Bugsnag's delivery thread checks out a database connection from the connection pool when the to_s method is invoked by Bugsnag::Cleaner#traverse_object
  6. The connection pool has now permanently lost a connection to the bugsnag delivery thread

Environment

  • Bugsnag version: 6.26.0
  • Ruby version: 3.0.5
  • Bundle version: 2.3.26
  • Integration framework version:
    • Que:
    • Rack 2.2.8
    • Rails: 6.1.6
    • Rake:
    • Sidekiq: 7.1.2
    • Other:

Example code snippet

# Run a single instance of sidekiq worker with concurrency 2

class Foo < ApplicationRecord
  def to_s
    Bar.first # Database query triggered during bugsnag report delivery
    "foo"
  end
end

class Bar < ApplicationRecord
end

class MyJob < ApplicationJob
  def perform(foo)
     Bar.first # Query will time out waiting for connection
     sleep 10
     raise "Fail"
  end
end

Foo.create
Bar.create
10.times { MyJob.perform_later(Foo.first) }
Error messages:

@mclack
Copy link

mclack commented Feb 23, 2024

Hi @evanlok

Apologies for the delay in response, and thanks for your patience.

In order for us to investigate this further, can you please confirm whether our understanding of what you are doing in the example is correct?

  1. You're running Sidekiq supporting 2 concurrent jobs
  2. You're enqueueing 10 jobs to run, passing in the reloaded Foo instance
  3. The jobs are sleeping for 10 seconds, then raising a failure
  4. When the failure is raised, BugSnag serializes the argument, which is a Foo
  5. Serializing the argument calls to_s on it
  6. This makes further network requests, and doesn’t return the connection to the pool

Please elaborate on anything here that may not be exactly correct.

@mclack mclack added the awaiting feedback Awaiting a response from a customer. Will be automatically closed after approximately 2 weeks. label Feb 23, 2024
@evanlok
Copy link
Author

evanlok commented Feb 23, 2024

Yes, that is correct.

@clr182
Copy link

clr182 commented Mar 1, 2024

Hi @evanlok

Can you please elaborate on why you are making a database query in to_s ? We generally wouldn't recommend making a database call in such a way as to_s is a standard method in Ruby used to cast the thing you're looking at to a string representation. You wouldn't really expect this to do anything other than represent the in-memory object.

@evanlok
Copy link
Author

evanlok commented Mar 1, 2024

The to_s method was querying for related configuration to determine the string output. We removed the database call from the method to resolve the issue but it's possible this can happen again in the future. It was quite difficult to debug initially so if there's a way bugsnag can prevent or warn when this occurs it would be helpful.

@mclack
Copy link

mclack commented Mar 15, 2024

Hi @evanlok

Thanks for your patience on this.

We now have a task on our backlog to discuss and explore our options here when priorities allow. I can't currently provide an ETA on when this could be looked at, but we will make sure to post any further updates on this thread regarding future discussions or developments on this.

@mclack mclack added needs discussion Requires internal analysis/discussion backlog We hope to fix this feature/bug in the future and removed awaiting feedback Awaiting a response from a customer. Will be automatically closed after approximately 2 weeks. labels Mar 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlog We hope to fix this feature/bug in the future needs discussion Requires internal analysis/discussion
Projects
None yet
Development

No branches or pull requests

3 participants