Non-actionable warnings about RTT #4851

MGPalmer · 2021-03-24T09:50:20Z

Ruby version: 2.7.2
Rails version: 6.1
Sidekiq / Pro / Enterprise version(s): 6.2.0

sidekiq.yml:

---
:verbose: true
:concurrency: <%= ENV.fetch('SIDEKIQ_CONCURRENCY', 5) %>

# Set timeout to 8 on Heroku, longer if you manage your own systems.
:timeout: 8
:queues:
  - ['critical', 6]
  - ['high', 4]
  - ['default', 2]
  - ['low', 1]

Hello!
I'm a little worried about the recently introduced warnings about RTT: #4824
I noticed this warning showing up in our logs several times a day, usually with around this range:

Mar 23 23:32:58 mge-application app/worker.1 pid=4 tid=2lps WARN: Your Redis network connection is performing extremely poorly.
Mar 23 23:32:58 mge-application app/worker.1 Current RTT is 95404 µs, ideally this should be < 1000.
Mar 23 23:32:58 mge-application app/worker.1 Ensure Redis is running in the same AZ or datacenter as Sidekiq.

However, usually the RTT is somewhere between 800 and 5000. The thing is, we're on Heroku and have basically no control over the Redis instance except the plan size (we have a medium-range "premium-5" instance). Sidekiq jobs seem to be handled reliably and speedily, no complaints.

So this warning is currently just noise to us. Is there a way to turn them off? Or am I missing something important here?

Thanks!
(BTW, love Sidekiq and your work, thanks :))

The text was updated successfully, but these errors were encountered:

jwilsjustin · 2021-03-24T14:37:17Z

Yeah. For deployments on Heroku there is no way for us to have a guaranteed AZ. See here. Maybe logging this as INFO would still suffice for those who want it?

MGPalmer · 2021-03-24T15:47:58Z

Ah thanks. Hmm the principle is fine :) From the top of my head, I would like to keep the warnings but configure the threshold to our case. Making it possible to override RTT_WARNING_LEVEL via ENV var and/or sidekiq.yml would be great for us.

mperham · 2021-03-24T16:11:22Z

Perhaps I shouldn't be taking one reading and WARNing based on it. I should be taking 3-5 readings over 30 seconds before logging anything, that would minimize log noise due to transient spikes.

I avoid config switches as they add code complexity.

jwilsjustin · 2021-03-24T16:22:44Z

+1 for that, @mperham.

MGPalmer · 2021-03-24T16:22:51Z

I guess that would work for us, too, the warnings for today for example are usually minutes up to an hour apart.

PhilCoggins · 2021-03-30T22:49:15Z

I am also on Heroku and have just started to notice these in my logs. Some of the values are very high:

Mar 29 07:14:46 fleetio app/sidekiq.2: Current RTT is 16703645 µs, ideally this should be < 1000.

If I'm not mistaken, this is a 16 second ping (not full request) from my Sidekiq server to Redis? I have opened a support request with Heroku, as this is pretty bad.

Would it be reasonable to correlate consistently high values with ERROR: heartbeat: Connection timed out? And is it possible for jobs to be dropped when seeing these errors?

UPDATE: I averaged the RTT values in my logs over the past 24 hours and came up with 247302.

mperham · 2021-03-30T23:06:42Z

@PhilCoggins That's awful. If you are seeing consistently poor performance, I would explain to Heroku Support about the poor latency and ask them to fail you over to a new Redis instance. Something is terribly wrong with that one.

TIL about Array#fill

mperham · 2021-03-30T23:48:27Z

I've updated master to take 5 samples and only warn if all five samples are above the threshold.

MGPalmer · 2021-03-31T07:28:08Z

Thanks everyone!

edmorley · 2021-04-06T13:15:51Z

@mperham I don't suppose it would be possible to publish a new sidekiq release to pick up 5b94bfe? A number of customers are opening tickets and presumably many (if not all) are due to transient spikes, rather than consistent slow RTT.

mperham · 2021-04-06T14:48:35Z

@edmorley Can you explain more? Is there some aspect that makes this high priority? I have one other thing I'm still looking into but it's possible I can release later this week.

edmorley · 2021-04-06T14:58:37Z

@mperham Just that the message in 6.2.0 can be the result of a temporary false positive, rather than a consistently high RTT, and the new sampling approach will eliminate the noise from those. Customers open tickets with "sidekiq says there is a problem with my Redis instance", and after investigation there is no issue with the Redis instance, and the ping is typically low.

mperham · 2021-04-06T15:23:42Z

Got it, thanks for the feedback. I need to remember that with great power comes great responsibility, sorry for the support noise. I will ship 6.2.1 tomorrow.

…

On Tue, Apr 6, 2021 at 7:58 AM Ed Morley ***@***.***> wrote: @mperham <https://github.com/mperham> Just that the message in 6.2.0 can be the result of a temporary false positive, rather than a consistent high RTT, and the new sampling approach will eliminate the noise from those. Customers open tickets with "sidekiq says there is a problem with my Redis instance", and after investigation there is no issue with the Redis instance, and the ping is typically low. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#4851 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAAWXYQ6QMUDCHTTSGYAZ3THMOSZANCNFSM4ZW4W75Q> .

mperham · 2021-04-08T00:30:12Z

6.2.1 is out.

edmorley · 2021-04-08T08:29:27Z

Thank you :-)

soma · 2021-04-08T10:13:09Z

❤️

MGPalmer · 2021-04-08T10:33:34Z

Looks like it's working :) Thanks!

mperham added a commit that referenced this issue Mar 30, 2021

Update RTT warning to use multiple samples, #4851

5b94bfe

TIL about Array#fill

mperham closed this as completed Mar 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-actionable warnings about RTT #4851

Non-actionable warnings about RTT #4851

MGPalmer commented Mar 24, 2021 •

edited

jwilsjustin commented Mar 24, 2021

MGPalmer commented Mar 24, 2021

mperham commented Mar 24, 2021

jwilsjustin commented Mar 24, 2021

MGPalmer commented Mar 24, 2021

PhilCoggins commented Mar 30, 2021 •

edited

mperham commented Mar 30, 2021

mperham commented Mar 30, 2021

MGPalmer commented Mar 31, 2021

edmorley commented Apr 6, 2021

mperham commented Apr 6, 2021

edmorley commented Apr 6, 2021 •

edited

mperham commented Apr 6, 2021 via email

mperham commented Apr 8, 2021

edmorley commented Apr 8, 2021

soma commented Apr 8, 2021

MGPalmer commented Apr 8, 2021

Non-actionable warnings about RTT #4851

Non-actionable warnings about RTT #4851

Comments

MGPalmer commented Mar 24, 2021 • edited

jwilsjustin commented Mar 24, 2021

MGPalmer commented Mar 24, 2021

mperham commented Mar 24, 2021

jwilsjustin commented Mar 24, 2021

MGPalmer commented Mar 24, 2021

PhilCoggins commented Mar 30, 2021 • edited

mperham commented Mar 30, 2021

mperham commented Mar 30, 2021

MGPalmer commented Mar 31, 2021

edmorley commented Apr 6, 2021

mperham commented Apr 6, 2021

edmorley commented Apr 6, 2021 • edited

mperham commented Apr 6, 2021 via email

mperham commented Apr 8, 2021

edmorley commented Apr 8, 2021

soma commented Apr 8, 2021

MGPalmer commented Apr 8, 2021

MGPalmer commented Mar 24, 2021 •

edited

PhilCoggins commented Mar 30, 2021 •

edited

edmorley commented Apr 6, 2021 •

edited