Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Messages dropped because of a full thread queue #816

Closed
krisdigital opened this issue Feb 29, 2024 · 6 comments
Closed

Messages dropped because of a full thread queue #816

krisdigital opened this issue Feb 29, 2024 · 6 comments

Comments

@krisdigital
Copy link

krisdigital commented Feb 29, 2024

Describe the bug

When the thread in the thread queue dies, messages are pushed on the queue and never delivered until the process restarts.

Steps to reproduce

See the example code - we found out by accident that messages where stuck and not delivered to Bugsnag. We got the message "Dropping notification, 101 outstanding requests" in the logs.

Environment

  • Bugsnag version: 6.26.3
  • Ruby version: 3.3.0
  • Bundle version: 2.5.6
  • Integration framework version:
    • Que:
    • Rack: 13.1.0
    • Rails: 7.1.3.2
    • Rake:
    • Sidekiq: 7.2.2
    • Other:

Example code snippet

This example code mirrors the behaviour in lib/bugsnag/delivery/thread_queue.rb:

p 'Start'
queue = Queue.new

queue.push(proc do
  p '1'
end)

queue.push(proc do
  p '2'
end)

queue.push(proc do
  p 1 / 0
end)

queue.push(proc do
  p '3'
end)

worker_thread = Thread.new do
  p 'Thread Start'
  while x = queue.pop
    x.call
  end
end

p "Alive: #{worker_thread.alive?}, Status: #{worker_thread.status}"

# worker_thread.join
sleep 3

p "Alive: #{worker_thread.alive?}, Status: #{worker_thread.status || 'nil'}"

Output:

"Start"
"Alive: true, Status: run"
"Thread Start"
"1"
"2"
#<Thread:0x00000001050fdb00 threads.rb:20 run> terminated with exception (report_on_exception is true):
threads.rb:13:in `/': divided by 0 (ZeroDivisionError)
	from threads.rb:13:in `block in <main>'
	from threads.rb:23:in `block in <main>'
"Alive: false, Status: nil"

Question: Could you maybe check if the worker_thread is still alive before pushing new messages and if not start a new thread? In the code example we can see, that after the exception the thread reports to be dead. So maybe this could be used as indicator to create a new worker thread?

@mclack
Copy link

mclack commented Mar 22, 2024

Hi @krisdigital

Thanks for raising this. We're looking into this and will update the thread as soon as we can.

@mclack mclack added the needs discussion Requires internal analysis/discussion label Mar 22, 2024
@clr182 clr182 added awaiting feedback Awaiting a response from a customer. Will be automatically closed after approximately 2 weeks. and removed needs discussion Requires internal analysis/discussion labels Mar 28, 2024
@clr182
Copy link

clr182 commented Mar 28, 2024

Hi @krisdigital

Thank you for your patience as we investigated this issue further.

The issue seems to stem from the worker thread terminating. Are you aware of any reasons why this may be happening? if so, could you please elaborate?

For background, the thread should stay alive until we stop it in an at_exit block:

at_exit do
@configuration.warn("Waiting for #{@queue.length} outstanding request(s)") unless @queue.empty?
@queue.push STOP
worker_thread.join
end

@krisdigital
Copy link
Author

Hi @clr182,

thank you for looking into it! I don't know why the thread terminated in our case, sadly. It may have been a bad message in an exception? But it is hard to tell.

The problem is that in this case the error reporting silently stops working. Would it maybe make sense to check if the thread is still running when a new message is pushed on the queue?

@clr182
Copy link

clr182 commented Apr 5, 2024

Hi Kris,

We do believe the worker thread dying is the root cause of your issue in this case. As previously stated, this thread should always be alive until wewe stop it in an at_exit block. Perhaps you could implement some further logging to determine the cause of this dying thread and investigate further from your side?

@krisdigital
Copy link
Author

Hi @clr182,

all right thank you, we will continue to look for the reason of the thread exiting!

@mclack
Copy link

mclack commented Jun 3, 2024

Hi @krisdigital

As there hasn't been any activity on the thread for a while, we are now going to close this issue.

If you continue to experience issues with this, or have any other questions, please feel free to reopen this or open a ticket with us directly by contacting support@bugsnag.com with further details or relevant information.

@mclack mclack closed this as completed Jun 3, 2024
@mclack mclack removed the awaiting feedback Awaiting a response from a customer. Will be automatically closed after approximately 2 weeks. label Jun 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants