Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce RTT readings warnings #74859

Closed
2 tasks
RachalCassity opened this issue Jan 30, 2024 · 3 comments · Fixed by department-of-veterans-affairs/vets-api#15725
Closed
2 tasks

Reduce RTT readings warnings #74859

RachalCassity opened this issue Jan 30, 2024 · 3 comments · Fixed by department-of-veterans-affairs/vets-api#15725

Comments

@RachalCassity
Copy link
Member

Discovery

Look into possible causes for the following warning.

  • Ensure all redis cluster configurations are correct.
  • Look into possible memory leaks

Warning

Your Redis network connection is performing extremely poorly.
Last RTT readings were [54881, 100896, 98796, 53572, 199853], ideally these should be < 1000.
Ensure Redis is running in the same AZ or datacenter as Sidekiq.
If these values are close to 100,000, that means your Sidekiq process may be
CPU-saturated; reduce your concurrency and/or see https://github.com/mperham/sidekiq/discussions/5039

Logs

https://vagov.ddog-gov.com/account/login?next=%2Flogs%3Fquery%3Denv%253Aeks-prod%2520service%253Avets-api%2520%2522CPU%2522%26cols%3Dhost%252Cservice%252Cpod_name%26event%3DAgAAAY1W2ayabC9a9gAAAAAAAAAYAAAAAEFZMVcyYkp6QUFDMjhBdllvQ3d4Q1FCRgAAACQAAAAAMDE4ZDU2ZGEtYjE1Mi00NmJmLWFhYTMtZjEyMmI2Yjk4NjFk%26index%3D%252A%26messageDisplay%3Dinline%26refresh_mode%3Dsliding%26stream_sort%3Ddesc%26viz%3Dstream%26from_ts%3D1706556427000%26to_ts%3D1706560027000%26live%3Dtrue

Sidekiq Discussions

sidekiq/sidekiq#4851

sidekiq/sidekiq#4851

@jennb33 jennb33 added needs-grooming Use this to designate any issues that need grooming from the team Analytics operations labels Feb 8, 2024
@LindseySaari
Copy link
Contributor

@jennb33 let's check with Reliability to make sure efforts aren't being duplicated

@jennb33
Copy link
Contributor

jennb33 commented Feb 23, 2024

@ericboehs is this work that Reliability team is already doing? Please advise, thanks!

@ericboehs
Copy link
Contributor

RTT Redis

  • CPU on pod is pegged to 50% from about 40 mins after it's created until it's terminated (~23 hours).
  • Screenshot 2024-02-28 at 13 25 10@2x
  • It appears to be the EVSS::FailedClaimsReport? JID cbd6bd6880a7acdb3610f44d and 7655b0bbb1236af4b9f727aa
    Screenshot 2024-02-28 at 13 26 38@2x
  • Looks like it's 5:
    Screenshot 2024-02-28 at 13 34 12@2x
  • Looking at what jobs are running in Sidekiq, the 5 containers with high CPU are all stuck running EVSS:FailedClaimsReport

ericboehs added a commit to department-of-veterans-affairs/vets-api that referenced this issue Feb 28, 2024
The EVSS::FailedClaimsReport was causing high CPU usage and preventing
other jobs from running (getting missed due to high Redis RTT times).

Fixes: department-of-veterans-affairs/va.gov-team#74859
ericboehs added a commit to department-of-veterans-affairs/vets-api that referenced this issue Feb 28, 2024
The EVSS::FailedClaimsReport was causing high CPU usage and preventing
other jobs from running (getting missed due to high Redis RTT times).

Fixes: department-of-veterans-affairs/va.gov-team#74859
@ericboehs ericboehs self-assigned this Feb 28, 2024
@jennb33 jennb33 removed the needs-grooming Use this to designate any issues that need grooming from the team label Mar 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants