`async` option will be dropped. #1522

st0012 · 2021-08-01T05:24:12Z

The async option is unique to the Ruby SDK. It was designed to help send events asynchronously through different backends (e.g. a Ruby thread, Sidekiq worker...etc.). Depends on the backend, it can pose thread to the system due to its additional extra memory consumption. So it's an option with some trade-offs.

But since version 4.1, the SDK now has its own background worker managed (implemented with the famous concurrent-ruby library). It can handle most of the async

The `async` Option Approach

The SDK serializes the event and event hint into json-compatible Ruby hashes.
It passes the event payload and hint to the block.
In general, the block would enqueue a background job with the above data.
- Some earlier apps use a new Ruby thread to send the data. This is unrecommended.
- With background job libraries like Sidekiq or Resque, this means adding objects into Redis.
- With delayed_job, this means adding a new delayed_job record.
A background worker (e.g. Sidekiq worker) then picks the event and hint and send it.

Pros

Users can customize their event sending logic. But generally it's just a worker with Sentry.send_event(event, hint).

Cons

The event payload (usually dozens of kbs) could be copied twice: first copied to the medium storage and then allocates the background worker process.
When there is an event spike, it can flood the medium storage (Redis) and take down the entire system.

The Background Worker

The SDK passes the event and its hint to the background worker (a pool of threads managed by concurrent-ruby).
A worker then picks the event, serializes it, and sents it.

Pros

It doesn't allocate extra memory other than the original event payload.
It's faster.
It doesn't require any user code.
The background worker doesn't queue more than 30 events. So even when there's a spike, it's unlikely to consume all the memory.

Cons

Unsent events will die with the process. Generally speaking, the queue time in background worker is very low. And the chance of missing events due to this reason is small in web apps. But for script programs, the process often leaves before the worker is able to send the event. This is why hint: { background: false } is required in rake integrations.
- ~~However, I don't think this problem can be solved with the async option.~~

This drawback has been addressed in #1617.

Missing Events During A Spike Because of Queue Limit

I know many users have concern about the background worker's 30 events queue limit will make them lose events during a spike. But as the maintainer and a user of this SDK, I don't worry about it because:

The spike is likely to be an urgent case, and that'll probably be fixed in a short time. So not seeing a few instances of other errors should not affect the overall coverage.
Given these characteristics of the SDK's background worker:
- The default number of background workers are determined by the number of process cores on your machine.
- They're a lot faster than the using the async approach with a sidekiq/resque...etc. worker due to the reason I described in the issue.
- A 30-event queue is only shared within the process/web instance, depends on the concurrency model you have. Not at a global level.
  If there's a spike big enough to overflow the SDK's queue and drop some events, it'll probably overflow your background job queue with the async option too and/or pose a greater damage to your system.
Sentry has a rate-limiting mechanism to prevent overflow on the platform side, which works by both rejecting new events and telling the SDK not to send new events with a 429 response. When the SDK receives a 429 response from Sentry during a spike, it'll stop sending "all events" for a given period of time.

What I'm trying to say is, it's not possible to expect Sentry to accept "all events" during a big spike regardless which approach you use. But when a spike happens, async is more likely to become another bottleneck and/or cause other problems in your system.

My Opinion

The async option seems redundant now and it could sometimes cause more harm. So I think we should drop it in version 5.0.

Questions

~~The above analysis is only based on my personal usage of the SDK and a few cases I helped debug with. So if you're willing to share your experience, I'd like to know~~

Even though the decision has been made, we still would like to hear feedback about it:

Do you use the async option in your apps?
- If you do, what's the motivation? Will you still use it after reading the above description?
- If you don't, is it an intentional decision? If it is, what's the reason behind it?
Do you disable the background workers with the background_worker_threads config option?
- If you do, why?
Or any feedback related to this topic.

The text was updated successfully, but these errors were encountered:

josh-m-sharpe · 2021-08-03T10:47:09Z

Long time user of sentry-ruby and sentry-raven. I didn't know this feature existed until a current migration of a large rails app to sentry. I noticed it while reading through the docs and attempted to set it up but immediately ran into issues and elected to defer in order to simplify the migration. In addition, we have a large amount of complex queues. Injecting this option into our system would likely require a bit of thought so as not to bowl over important queues while keeping error reporting timely.

I suppose this is a vote to drop it?

louim · 2021-08-20T15:31:14Z

Hey! We currently use the async option. Mostly because it was recommended in the docs when we setup the app a long time ago (not sure the background worker option was even a thing at that time 👴🏼 ).

I'm curious about that part:

The background worker doesn't queue more than 30 events. So even when there's a spike, it's unlikely to consume all the memory.

What would happen if there is a spike in events, let's say from a noisy error filling the queue. Would another error happening at the same time be silently dropped because the queue is full? Or am I misunderstanding how it works?

I'd like to switch away from async because json serialization from the async version mean we have to check two version of the payload when doing custom processing in before_send, ex:

data = event.to_hash
errors = data.dig(:exception, :values) if data[:exception]
errors = data.dig("exception", "values") if data["exception"]

st0012 · 2021-08-21T14:32:11Z

not sure the background worker option was even a thing at that time 👴🏼

Background worker was added since v4.1.0. So it's a new thing for most users I think 🙂

What would happen if there is a spike in events, let's say from a noisy error filling the queue. Would another error happening at the same time be silently dropped because the queue is full? Or am I misunderstanding how it works?

As of now, the queue doesn't distinguish events. So if there's a spike of a particular error, other errors may not make it into the queue. But personally I'm not worry about this because:

The spike is likely to be an urgent case, and that'll probably be fixed in a short time. So not seeing a few instances of other errors should not affect the overall coverage.
Given these characteristics of the SDK's background worker:
- The default number of background workers are determined by the number of process cores on your machine.
- They're a lot faster than the using the async approach with a sidekiq/resque...etc. worker due to the reason I described in the issue.
- A 30-event queue is only shared within the process/web instance, depends on the concurrency model you have. Not at a global level.

If there's a spike big enough to overflow the SDK's queue and drop some events, it'll probably overflow your background job queue with the async option too and/or pose a greater damage to your system.

kzaitsev · 2021-09-27T16:12:58Z

At my job, we are working fine with the async because we already have the sidekiq in our stack. I can't understand why async can't be an optional way to deliver events without async deprecation? As a solution, you can highlight possible issues with async in the documentation.

Maybe I can't understand the problem because using sentry only for exceptions without APM.

st0012 · 2021-09-27T16:36:10Z

@Bugagazavr

There are 2 main cost of having this option around:

Among all Sentry SDKs, sentry-ruby is the only one that supports such option. This means we always need to consider this extra condition when making changes to the event sending logic. And it'll make future SDK alignment harder. (It surely made sentry-raven -> sentry-ruby conversion harder)
We need to spend additional effort maintaining the code for this option, and sometimes the result isn't pretty:

sentry-ruby/sentry-ruby/lib/sentry/client.rb

Lines 119 to 134 in 42455c8

    
           def dispatch_async_event(async_block, event, hint) 
        
             # We have to convert to a JSON-like hash, because background job 
        
             # processors (esp ActiveJob) may not like weird types in the event hash 
        
             event_hash = event.to_json_compatible 
        
             if async_block.arity == 2 
        
               hint = JSON.parse(JSON.generate(hint)) 
        
               async_block.call(event_hash, hint) 
        
             else 
        
               async_block.call(event_hash) 
        
             end 
        
           rescue => e 
        
             loggable_event_type = event_hash["type"] || "event" 
        
             log_error("Async #{loggable_event_type} sending failed", e, debug: configuration.debug) 
        
             send_event(event, hint) 
        
           end

sentry-ruby/sentry-rails/app/jobs/sentry/send_event_job.rb

Lines 1 to 33 in 42455c8

    
           if defined?(ActiveJob) 
        
             module Sentry 
        
               parent_job = 
        
                 if defined?(::ApplicationJob) && ::ApplicationJob.ancestors.include?(::ActiveJob::Base) 
        
                   ::ApplicationJob 
        
                 else 
        
                   ::ActiveJob::Base 
        
                 end 
        
               class SendEventJob < parent_job 
        
                 # the event argument is usually large and creates noise 
        
                 self.log_arguments = false if respond_to?(:log_arguments=) 
        
                 # this will prevent infinite loop when there's an issue deserializing SentryJob 
        
                 if respond_to?(:discard_on) 
        
                   discard_on ActiveJob::DeserializationError 
        
                 else 
        
                   # mimic what discard_on does for Rails 5.0 
        
                   rescue_from ActiveJob::DeserializationError do 
        
                     logger.error "Discarded #{self.class} due to a #{exception}. The original exception was #{error.cause.inspect}." 
        
                   end 
        
                 end 
        
                 def perform(event, hint = {}) 
        
                   Sentry.send_event(event, hint) 
        
                 end 
        
               end 
        
             end 
        
           else 
        
             module Sentry 
        
               class SendEventJob; end 
        
             end 
        
           end

If the upside is high, these cost wouldn't be an issue. That's why we have had it for many years. But since we already have a better solution for the problem (background worker) that has much less downside, I don't think it's worth it now.

See getsentry/sentry-ruby#1522 for details and why we want to do this change. We suspect this also may be a cause of https://app.shortcut.com/simpledotorg/story/5282/redis-down-in-production

…rker (#2951) * Drop sentry async - use Sentry's builtin background worker See getsentry/sentry-ruby#1522 for details and why we want to do this change. We suspect this also may be a cause of https://app.shortcut.com/simpledotorg/story/5282/redis-down-in-production * Explicitly turn off Sentry tracing We are using Datadog for tracing currently, and having two tools capturing trace info is not necessary and may add some overhead we don't want.. * Revert "Explicitly turn off Sentry tracing" This reverts commit e1eae44.

github-actions · 2021-10-28T21:03:22Z

This issue has gone three weeks without activity. In another week, I will close it.

But! If you comment or otherwise update it, I will reset the clock, and if you label it Status: Backlog or Status: In Progress, I will leave it alone ... forever!

"A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀

benoittgt · 2021-11-23T17:38:20Z

We were using a custom sidekiq task in an async block. But multiple times, we had the issue of

When there is an event spike, it can flood the medium storage (Redis) and take down the entire system.

And it was a nightmare.

We just switched to default by removing async and use the BackgroundWorker that use concurrent-ruby. Work perfectly for us for the moment. I will post a message on the next spike. 😄

trevorturk · 2022-01-04T20:45:07Z

I think removing async seems fine. I was using it to queue up a small custom ActiveJob that just called Sentry, but I'm happy to just use the built in method. I agree event spikes are fine to cap at X events since Sentry would drop them anyway and you don't want to overwhelm your system anyway. You might consider in the docs telling users that they can just remove the async line and things should work fine. I'm testing that now to be sure, but I believe that's the recommendation?

st0012 · 2022-01-15T16:33:31Z

@trevorturk Thanks for the feedback. The current documents on Sentry and RDoc both suggests that it's a deprecated option and points user to this issue for more info 😄

I believe that's the recommendation?

Yes you should be able to delete it directly without any issue. But if you do, please report it here and I'll fix it ASAP 😉

trevorturk · 2022-01-15T18:08:22Z

It all worked fine when I just deleted it, thank you!

dillonwelch · 2022-01-26T21:20:41Z

But for script programs, the process often leaves before the worker is able to send the event. This is why hint: { background: false } is required in rake integrations.

Can you elaborate on why this is and why this wouldn't be in https://github.com/getsentry/sentry-ruby/blob/efcf170b5f6dd65c3b047825bddd8fde87fc6b7b/sentry-ruby/lib/sentry/rake.rb?

st0012 · 2022-01-27T11:36:16Z

@dillonwelch sorry that I forgot to update the description. that issue has been addressed in #1617 and don't exist anymore 🙂

jordan-brough · 2022-02-11T19:24:26Z

For instrumenting AWS Lambda, it seems like it'd be ideal to use Lambda Extensions to deliver error reports. At first glance it looks like the async option would be the way to integrate with Lambda Extensions. Do any of the Sentry libraries for other languages hook into Lambda Extensions? Would there be another way to do that with the Sentry Ruby SDK?

st0012 · 2022-02-12T15:40:23Z

@jordan-brough I'm not familiar with AWS Lambda but it looks like the Sentry's Lambda Extension doesn't need language SDKs to work? It looks directly integrated with AWS Lambda.
Can you explain your use case with more detail?

sl0thentr0py · 2022-02-14T11:54:04Z

@jordan-brough currently, AWS lambda on ruby is not a first-class feature like for python/node where we have separate packages/integrations/layers implemented. AWS extensions are also something we are looking into right now for both those ecosystems. Ruby is not on the road map per se, but we would be open to seeing if that's something the community wants. I'm making a new issue to gauge interest since it's tangential here.

jordan-brough · 2022-02-14T14:57:01Z

@sl0thentr0py thanks 👍 I've commented over in that issue.

jordan-brough · 2022-02-14T14:58:55Z

@st0012 how Lambda works is AWS spins up an instance of some code you've written and invokes "handler" method with an "event" payload (e.g. a web request). And then:

After the function ... [has] completed, Lambda maintains the execution environment for some time in anticipation of another function invocation. In effect, Lambda freezes the execution environment. When the function is invoked again, Lambda thaws the environment for reuse

https://docs.aws.amazon.com/lambda/latest/dg/runtimes-context.html

So you have some code that runs and then gets "frozen" and then maybe gets "unfrozen" again in the future to handle future events (e.g. web requests). But if you deploy new code, or if there is too long of delay between events then Lambda discards the frozen execution environment.

So if you have some async code you may end up in this situation:

Run the main "handler" (application) code
Start an async process along the way (e.g. sending events to Sentry)
Finish the handler code before the async code finishes (or perhaps even starts)
Lambda gets frozen

And then your async code may never run if Lambda ends up discarding the frozen execution environment. And Sentry events would get lost.

So currently the way to make Lambda work reliably with Sentry would be to make Sentry operate synchronously. But then you have Sentry possibly slowing down your application code and/or potentially affecting it if there were Sentry bugs/outages etc (even though I'm sure that would never happen! 😉). Which I assume is one reason why Sentry runs asynchronously in the first place.

Last year AWS released a solution to this issue called "Lambda Extensions". You can use Lambda extensions to allow a Lambda function to handle application code synchronously while also enqueuing asynchronous events to "extensions" (e.g. a Sentry extension) which don't block the main application code.

A configuration option like the async config discussed in this PR might be a good way to integrate w/ that, though I haven't looked into the code here or the details on that in depth.

hogelog · 2022-03-13T06:58:02Z

I agree dropping the async option.

I'm developing rails app that uses config.async with Sidekiq, but I will disable this config and use background worker or send synchronously.

I was originally aware of this issue and was considering disabling config.async. However, before disabling it, an error spike caused sidekiq redis down.

I plan to disable config.async and use background worker.
I am a little concerned about the queue limit. However, I checked latency of our rails app annd error reporting http request (to sentry.io), queue overflow will probably not occur.

It would be more reassuring to be able to check the queue overflow.
sentry-ruby records queue overflow event and send that client report.
https://github.com/getsentry/sentry-ruby/blob/5.2.0/sentry-ruby/lib/sentry/client.rb#L63
I would be pleasure if these queue overflow errors visualized on sentry.io's UI.

It doesn't look bad approach sending errors synchronously too.
Our rails app's APM reports http request's to sentry.io latency is almost within 300ms. I feel that it is not that much of a problem for rack workers to consume that amount of time when an error occurs.

sl0thentr0py · 2022-03-14T09:34:24Z

I would be pleasure if these queue overflow errors visualized on sentry.io's UI.

@hogelog this is on the product road map, but I can't give you an ETA on when it will be shipped yet. But we certainly want to expose these statistics to the user eventually.

st0012 · 2022-03-14T10:35:14Z

It doesn't look bad approach sending errors synchronously too.

@hogelog I should also mention that if you have tracing enabled, transaction events should also be taken into consideration as well. so I won't recommend sending errors synchronously if you do.

hogelog · 2022-03-15T02:30:56Z

I would be pleasure if these queue overflow errors visualized on sentry.io's UI.

@hogelog this is on the product road map, but I can't give you an ETA on when it will be shipped yet. But we certainly want to expose these statistics to the user eventually.

I'm looking forward to it!

@hogelog I should also mention that if you have tracing enabled, transaction events should also be taken into consideration as well. so I won't recommend sending errors synchronously if you do.

I’m not using tracing, so I was not aware. Considering the future, it may not be good to send events directly. thanks!

ariccio · 2022-05-24T00:43:35Z

I greatly appreciate this extra warning message! It's a good warning. Can you tell me exactly what I should do? Should I simply remove this code from my codebase, or do I need to add something else?

  config.async = lambda do |event, hint|
    ::Sentry::SendEventJob.perform_later(event, hint)
  end

st0012 · 2022-05-28T09:45:42Z

@ariccio You can simply delete it 😉

vadviktor · 2022-08-02T16:30:02Z

We at my company have been plagued by the async feature for 1-2 years now; hitting the limit of the payload and the rate limit made us scratch our heads on how to safeguard against those (before Sentry internally remedied them). How we've caught these issues? By seeing Sidekiq being brought to its knees and important messages not processed.

Right now, we have decided to end the async reign, and I was glad to see this feature being deprecated since our last version update. 🙌

This feature will be removed in a future version of Sentry, and it might be a source of some Redis memory problems (which have been and maybe still are a problem for P&E Dashboard). See getsentry/sentry-ruby#1522

* getsentry/sentry-ruby#1522

st0012 added the sentry-ruby label Aug 1, 2021

st0012 added this to To do in 4.x via automation Aug 1, 2021

st0012 pinned this issue Aug 1, 2021

st0012 added this to the 5.0.0 milestone Aug 1, 2021

st0012 added this to To do in 5.x Aug 6, 2021

st0012 removed this from To do in 4.x Aug 6, 2021

st0012 mentioned this issue Aug 21, 2021

sentry-rails: NoMethodError: undefined method `span' for nil:NilClass #1539

Closed

rsanheim mentioned this issue Sep 28, 2021

Drop sentry async job submission - use Sentry's builtin background worker simpledotorg/simple-server#2951

Merged

st0012 mentioned this issue Oct 1, 2021

sentry-sidekiq throwing ActiveJob::SerializationError in v4.7.3 #1585

Closed

github-actions bot added the Status: Stale label Oct 28, 2021

st0012 added Status: Backlog and removed Status: Stale labels Oct 28, 2021

st0012 mentioned this issue Oct 29, 2021

[Plan] Version 6 #1279

Open

st0012 modified the milestones: 5.0.0, 6.0.0 Jan 7, 2022

sl0thentr0py mentioned this issue Feb 14, 2022

AWS lambda support #1728

Open

st0012 mentioned this issue Mar 12, 2022

Cannot remove breadcrumbs from huge envelope when config.async #1757

Closed

st0012 mentioned this issue Mar 20, 2022

delayed job integration: payload reached its size limit #1603

Closed

st0012 mentioned this issue Apr 1, 2022

Handled Sinatra exceptions still captured #1748

Closed

st0012 mentioned this issue Apr 13, 2022

sentry-sidekiq throwing ActiveJob::SerializationError #1790

Closed

st0012 changed the title ~~Feedback Wanted - Drop the async option?~~ async option will be dropped. Apr 13, 2022

st0012 self-assigned this Apr 23, 2022

st0012 mentioned this issue Apr 30, 2022

Warn users about config.async's deprecation #1803

Merged

senid231 mentioned this issue Jun 27, 2022

Use recommended sentry send mechanism yeti-switch/yeti-web#1150

Merged

st0012 mentioned this issue Aug 30, 2022

ERROR -- sentry: "exception happened in background worker: execution expired #1878

Closed

st0012 mentioned this issue Sep 15, 2022

Drop async configuration #1894

Merged

moenodedev mentioned this issue Sep 30, 2022

How to exclude exceptions client side before .send_event? #1908

Closed

wilco375 mentioned this issue Nov 4, 2022

Update raven to sentry csvalpha/sofia#821

Merged

jeffwidman mentioned this issue Jul 12, 2023

Async option removal was delayed til 6.0 getsentry/sentry-docs#7422

Merged

3 tasks

hubertdeng123 removed the Status: Backlog label Jul 25, 2023

sl0thentr0py mentioned this issue Jul 26, 2023

My api-only mode rails service not working on 5.10 #2077

Closed

shanamatthews pushed a commit to getsentry/sentry-docs that referenced this issue Aug 2, 2023

Async option removal was delayed til 6.0 (#7422)

42dd87b

* getsentry/sentry-ruby#1522

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`async` option will be dropped. #1522

`async` option will be dropped. #1522

st0012 commented Aug 1, 2021 •

edited

josh-m-sharpe commented Aug 3, 2021

louim commented Aug 20, 2021

st0012 commented Aug 21, 2021 •

edited

kzaitsev commented Sep 27, 2021

st0012 commented Sep 27, 2021

github-actions bot commented Oct 28, 2021

benoittgt commented Nov 23, 2021

trevorturk commented Jan 4, 2022

st0012 commented Jan 15, 2022 •

edited

trevorturk commented Jan 15, 2022

dillonwelch commented Jan 26, 2022

st0012 commented Jan 27, 2022

jordan-brough commented Feb 11, 2022 •

edited

st0012 commented Feb 12, 2022

sl0thentr0py commented Feb 14, 2022 •

edited

jordan-brough commented Feb 14, 2022

jordan-brough commented Feb 14, 2022

hogelog commented Mar 13, 2022

sl0thentr0py commented Mar 14, 2022 •

edited

st0012 commented Mar 14, 2022

hogelog commented Mar 15, 2022

ariccio commented May 24, 2022

st0012 commented May 28, 2022

vadviktor commented Aug 2, 2022

async option will be dropped. #1522

async option will be dropped. #1522

Comments

st0012 commented Aug 1, 2021 • edited

The async Option Approach

Pros

Cons

The Background Worker

Pros

Cons

Missing Events During A Spike Because of Queue Limit

My Opinion

Questions

josh-m-sharpe commented Aug 3, 2021

louim commented Aug 20, 2021

st0012 commented Aug 21, 2021 • edited

kzaitsev commented Sep 27, 2021

st0012 commented Sep 27, 2021

github-actions bot commented Oct 28, 2021

benoittgt commented Nov 23, 2021

trevorturk commented Jan 4, 2022

st0012 commented Jan 15, 2022 • edited

trevorturk commented Jan 15, 2022

dillonwelch commented Jan 26, 2022

st0012 commented Jan 27, 2022

jordan-brough commented Feb 11, 2022 • edited

st0012 commented Feb 12, 2022

sl0thentr0py commented Feb 14, 2022 • edited

jordan-brough commented Feb 14, 2022

jordan-brough commented Feb 14, 2022

hogelog commented Mar 13, 2022

sl0thentr0py commented Mar 14, 2022 • edited

st0012 commented Mar 14, 2022

hogelog commented Mar 15, 2022

ariccio commented May 24, 2022

st0012 commented May 28, 2022

vadviktor commented Aug 2, 2022

`async` option will be dropped. #1522

`async` option will be dropped. #1522

st0012 commented Aug 1, 2021 •

edited

The `async` Option Approach

st0012 commented Aug 21, 2021 •

edited

st0012 commented Jan 15, 2022 •

edited

jordan-brough commented Feb 11, 2022 •

edited

sl0thentr0py commented Feb 14, 2022 •

edited

sl0thentr0py commented Mar 14, 2022 •

edited