Skip to content

Commit

Permalink
Log poor Redis RTT, fixes #4824
Browse files Browse the repository at this point in the history
  • Loading branch information
mperham committed Feb 22, 2021
1 parent 50740f2 commit 285d654
Show file tree
Hide file tree
Showing 4 changed files with 35 additions and 5 deletions.
1 change: 1 addition & 0 deletions Changes.md
Expand Up @@ -5,6 +5,7 @@
HEAD
---------

- Store Redis RTT and log if poor [#4824]
- Add process/thread stats to Busy page [#4806]
- Refactor Web UI session usage. [#4804]
Numerous people have hit "Forbidden" errors and struggled with Sidekiq's
Expand Down
7 changes: 4 additions & 3 deletions lib/sidekiq/api.rb
Expand Up @@ -791,12 +791,12 @@ def each
# you'll be happier this way
conn.pipelined do
procs.each do |key|
conn.hmget(key, "info", "busy", "beat", "quiet", "rss")
conn.hmget(key, "info", "busy", "beat", "quiet", "rss", "rtt_us")
end
end
}

result.each do |info, busy, at_s, quiet, rss|
result.each do |info, busy, at_s, quiet, rss, rtt|
# If a process is stopped between when we query Redis for `procs` and
# when we query for `result`, we will have an item in `result` that is
# composed of `nil` values.
Expand All @@ -806,7 +806,8 @@ def each
yield Process.new(hash.merge("busy" => busy.to_i,
"beat" => at_s.to_f,
"quiet" => quiet,
"rss" => rss.to_i))
"rss" => rss.to_i,
"rtt_us" => rtt.to_i))
end
end

Expand Down
26 changes: 26 additions & 0 deletions lib/sidekiq/launcher.rb
Expand Up @@ -153,6 +153,8 @@ def ❤
end
end

rtt = check_rtt

fails = procd = 0
kb = memory_usage(::Process.pid)

Expand All @@ -163,6 +165,7 @@ def ❤
conn.hmset(key, "info", to_json,
"busy", curstate.size,
"beat", Time.now.to_f,
"rtt_us", rtt,
"quiet", @done,
"rss", kb)
conn.expire(key, 60)
Expand All @@ -185,6 +188,29 @@ def ❤
end
end

RTT_WARNING_LEVEL = 50_000

def check_rtt
a = b = 0
Sidekiq.redis do |x|

This comment has been minimized.

Copy link
@nateberkopec

nateberkopec Oct 14, 2021

Collaborator

@mperham Because the ping releases the GVL and other Sidekiq threads can do work in the background between lines 196 and 198, a fully-utilized Sidekiq server will constantly complain about its Redis RTT. I think this is misleading and could cause people to spend a lot of time debugging their Redis install when it's really just the case that their Sidekiq threads are completely loaded.

This can be reproduced by any program that causes a Sidekiq process' CPU utilization to hit ~100%.

This comment has been minimized.

Copy link
@mperham

mperham Oct 14, 2021

Author Collaborator

That sounds like a feature, not a bug. Perhaps the message needs to be crafted to point this out.

This comment has been minimized.

Copy link
@nateberkopec

nateberkopec Oct 14, 2021

Collaborator

I'm not sure it is a feature in a lot of cases. A lot of situations would be perfectly happy with a Sidekiq chugging along at nearly-100% CPU usage, if what they care about is maximizing throughput.

Currently I'm going to have to explain in Sidekiq in Practice that people are going to see this warning but that they should ignore it 🤷

This comment has been minimized.

Copy link
@mperham

mperham Oct 14, 2021

Author Collaborator

Agreed that this might be mistaking Redis latency for thread scheduling latency.

Can you open an issue and we'll talk about improvements here? I think it is worthwhile to have a signal which says "Your Sidekiq process is overloaded, reduce concurrency". Note the warning only shows if >100ms five times in a row, implying a heavily loaded process, so I think better advice is to reduce concurrency.

a = ::Process.clock_gettime(::Process::CLOCK_MONOTONIC, :microsecond)
x.ping
b = ::Process.clock_gettime(::Process::CLOCK_MONOTONIC, :microsecond)
end
rtt = b - a
# Ideal RTT for Redis is < 1000µs
# Workable is < 10,000µs
# Log a warning if it's a disaster.
if rtt > RTT_WARNING_LEVEL
Sidekiq.logger.warn <<-EOM
Your Redis network connection is performing extremely poorly.
Current RTT is #{rtt} µs, ideally this should be < 1000.
Ensure Redis is running in the same AZ or datacenter as Sidekiq.
EOM
end
rtt
end

MEMORY_GRABBER = case RUBY_PLATFORM
when /linux/
->(pid) {
Expand Down
6 changes: 4 additions & 2 deletions test/test_launcher.rb
Expand Up @@ -49,9 +49,11 @@ def new_manager(opts)
it 'stores process info in redis' do
subject.heartbeat

workers = Sidekiq.redis { |c| c.hmget(subject.identity, 'busy') }
workers, rtt = Sidekiq.redis { |c| c.hmget(subject.identity, 'busy', 'rtt_us') }

assert_equal ["1"], workers
assert_equal "1", workers
refute_nil rtt
assert_in_delta 1000, rtt.to_i, 1000

expires = Sidekiq.redis { |c| c.pttl(subject.identity) }

Expand Down

0 comments on commit 285d654

Please sign in to comment.