Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better handling of Redis connection issues #707

Closed
rgalanakis opened this issue Jan 7, 2024 · 6 comments
Closed

Better handling of Redis connection issues #707

rgalanakis opened this issue Jan 7, 2024 · 6 comments
Labels
bug Something is not working right/as expected.

Comments

@rgalanakis
Copy link
Contributor

rgalanakis commented Jan 7, 2024

Redis has connection problems, for example:

https://lithic-technology.sentry.io/issues/4264119348
https://lithic-technology.sentry.io/issues/4264118801

Redis isn't available so the job is totally lost. DurableJobs sees this, and adds the job as a DeadSet job. We need to replay these manually.

Instead, DurableJobs should move jobs that fail due to connection errors in the Redis connection layer into the RetrySet so it's automatically retried.

The reset happens in redis/connection/ruby.rb in connect_nonblock at line 264

Migrated from webhookdb-api#707. Opened by @sentry-io[bot]

Opened at: 2023-06-21T05:58:21Z
Closed at: 2023-06-29T18:42:13Z

@rgalanakis rgalanakis added this to the 10 - Stability milestone Jan 7, 2024
@rgalanakis rgalanakis added the bug Something is not working right/as expected. label Jan 7, 2024
@rgalanakis
Copy link
Contributor Author

Sentry issue: WEBHOOKDB-API-J6

Original Comment by @sentry-io[bot]

@rgalanakis
Copy link
Contributor Author

Possibly fixed in #718

Original Comment by @rgalanakis

@rgalanakis
Copy link
Contributor Author

Or in #719

Original Comment by @rgalanakis

@rgalanakis
Copy link
Contributor Author

We can try upgrading Sidekiq to fix this; this should not be a change in Durable Jobs, this is a bug with infra or libraries, and we should not hide that.

Original Comment by @rgalanakis

@rgalanakis
Copy link
Contributor Author

Heroku support said to try updating the timeout to 25 hours, as per redis/redis-rb#1107 (comment)

heroku redis:timeout redis-curved-26048 --seconds=90000 --app=webhookdb-api-production

Original Comment by @rgalanakis

@rgalanakis
Copy link
Contributor Author

Closing this for now but will reopen if not fixed.

Original Comment by @rgalanakis

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is not working right/as expected.
Projects
None yet
Development

No branches or pull requests

1 participant