Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synchronously drain async error reporting queue #1612

Closed
iloveitaly opened this issue Nov 8, 2021 · 7 comments · Fixed by #1617
Closed

Synchronously drain async error reporting queue #1612

iloveitaly opened this issue Nov 8, 2021 · 7 comments · Fixed by #1617
Assignees
Projects
Milestone

Comments

@iloveitaly
Copy link
Contributor

If you are using a background queue like Resque, threads are killed after a job completes. In this scenario, I believe errors would fail to send to sentry.

Is there a way to synchronously drain the async error reporting queue? This could be added to a before exit hook on the resque side.

@st0012 st0012 added this to To do in 4.x via automation Nov 8, 2021
@st0012
Copy link
Collaborator

st0012 commented Nov 8, 2021

I'm aware of the problem you mentioned here, but it depends on how the errors are captured.
For exceptions captured by the sentry-resque gem, they'll be sent immediately and doesn't enter the background worker. So the scenario doesn't affect those events.
But for manually captured events, they may be lost if the process is killed before they're sent as you said. I've tried to tackle this before by applying wait_for_termination to the thread pool executor:

Concurrent::ThreadPoolExecutor.new(
min_threads: 0,
max_threads: @number_of_threads,
max_queue: @max_queue,
fallback_policy: :discard
)

But I think it didn't work. Other than that, I don't have much option to force concurrent-ruby finishing all the jobs.

@iloveitaly
Copy link
Contributor Author

@st0012 thanks for the snappy reply!

Yeah, it's a tricky problem.

Could we add something along the lines of drain_and_shutdown to BackgroundWorker? It could:

  • Send a shutdown signal to the thread pool. I'm not sure, but I imagine this would signal to the threads to finish up their current work and terminate?
  • Call wait_for_termination with a configurable timeout?

This would enable the resque user to do something like:

    module ResqueHooks
      def after_perform_send_sentry(*_args)
        Sentry::BackgroundWorker.drain_and_shutdown(1)
      end
    end

@st0012
Copy link
Collaborator

st0012 commented Nov 8, 2021

It's not that I don't want to provide such API, but the underneath concurrent-ruby doesn't seem to have one (document). Perhaps I'm missing something here?
If you can find an API for that, I'll try. I'll also consider a different thread pool worker if there's any?

@iloveitaly
Copy link
Contributor Author

I'm no concurrent-ruby expert, but I think this three-step flow would work?

image

@st0012
Copy link
Collaborator

st0012 commented Nov 9, 2021

Last time I checked this issue on concurrent-ruby I found this: ruby-concurrency/concurrent-ruby#841. From the PR, it looks like the current implementation should replace the manual shutdown procedure with at_exit block. So I assume this will not do much of the difference. But I'll check this again recently.

@iloveitaly
Copy link
Contributor Author

Here's what I'm testing out:

Sentry::BackgroundWorker.class_eval do
  def drain_and_shutdown(timeout=1)
    return if @executor.class != Concurrent::ThreadPoolExecutor

    @executor.shutdown
    return if @executor.wait_for_termination(timeout)
    @executor.kill
  end
end

Shutdown process copied from:
https://github.com/ruby-shoryuken/shoryuken/blob/0d89335abca86021a9d9036a93d43233aa8b061c/lib/shoryuken/launcher.rb#L26-L33

Here's what the resque hooks look like:

    module ResqueHooks
      def after_perform_send_errors(*_args)
        Sentry.background_worker.drain_and_shutdown
      end

      def on_failure_send_errors(*_args)
        Sentry.background_worker.drain_and_shutdown
      end
    end

@st0012
Copy link
Collaborator

st0012 commented Nov 13, 2021

@iloveitaly thanks for digging into this issue and even provide a sample code for the solution 🙏
with the information you provided, I think the SDK should handle the shutdown instead of letting you define custom callbacks. so I added #1617 that hopefully can solve the issue. can you give it a try?

@st0012 st0012 added this to the 5.0.0 milestone Nov 17, 2021
@st0012 st0012 modified the milestones: 5.0.0, 4.8.1 Nov 19, 2021
4.x automation moved this from To do to Done Nov 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
4.x
  
Done
Development

Successfully merging a pull request may close this issue.

2 participants