Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add digest scores for faster deletes in sorted sets #835

Conversation

ezekg
Copy link
Contributor

@ezekg ezekg commented Feb 13, 2024

Closes #668. First pass on my idea of using scores to "skip ahead" so that we don't have to iterate the entire sorted set when a unique job is deleted. Let me know what you think. I'm sure I'm missing some edge cases, since I don't have the domain knowledge you do here. This should be backwards compatible, reverting back to previous behavior if a score doesn't exist.

I still want to add a performance test with a large schedule queue to make sure this actually works.

@ezekg ezekg force-pushed the feature/add-score-to-unique-jobs-for-sorted-sets branch from e2e1b02 to 565f1c4 Compare February 13, 2024 22:41
lib/sidekiq_unique_jobs/key.rb Outdated Show resolved Hide resolved
@mhenrixon
Copy link
Owner

mhenrixon commented Feb 14, 2024

Let's think about this for a while. We already store a score when the digest is created. It has both a score and a job_id. We could simplify this by looking up the current digest's score and job id.

We also have this issue: Sidekiq only checks with the heartbeat method every ten seconds. This means that for ten seconds, things can be missing.

@ezekg
Copy link
Contributor Author

ezekg commented Feb 14, 2024

We already store a score when the digest is created. It has both a score and a job_id.

Ah, I wasn't aware we were already storing the score. I looked around but I guess I missed it. That makes things easier.

Sidekiq only checks with the heartbeat method every ten seconds. This means that for ten seconds, things can be missing.

What does Sidekiq's heartbeat system do? I'm not familiar with it and what "missing" means here.

@ezekg ezekg force-pushed the feature/add-score-to-unique-jobs-for-sorted-sets branch from 565f1c4 to 60e380f Compare February 14, 2024 17:20
@mhenrixon
Copy link
Owner

mhenrixon commented Feb 14, 2024

We already store a score when the digest is created. It has both a score and a job_id.

Ah, I wasn't aware we were already storing the score. I looked around but I guess I missed it. That makes things easier.

Sidekiq only checks with the heartbeat method every ten seconds. This means that for ten seconds, things can be missing.

What does Sidekiq's heartbeat system do? I'm not familiar with it and what "missing" means here.

Both of your comments can be explained here: https://github.com/mhenrixon/sidekiq-unique-jobs/pull/830/files.

Here, I am optimizing the ruby reaper to use the digest score (zrange byscore) to limit the number of jobs we look through and prevent removing locks that appear to have no job because of being missing for ten seconds.

There are some linked issues (I hope) that talk about this, and somewhere, there is a link to a sidekiq issue where @mperham explains this.

We could apply similar thinking to your issue.

EDIT: @ezekg this is the sidekiq issue I was talking about: sidekiq/sidekiq#6153

@ezekg ezekg force-pushed the feature/add-score-to-unique-jobs-for-sorted-sets branch 4 times, most recently from 664c259 to c1f52e8 Compare February 16, 2024 22:20
@ezekg
Copy link
Contributor Author

ezekg commented Feb 16, 2024

@mhenrixon I added a rudimentary performance test that fails for the old algorithm.

For a schedule queue of 100,000 jobs, we go from this:

1708122488.686405 [0 lua] "ZRANGE" "schedule" "0" "49"
1708122488.686549 [0 lua] "ZRANGE" "schedule" "50" "99"
1708122488.686651 [0 lua] "ZRANGE" "schedule" "100" "149"
1708122488.686728 [0 lua] "ZRANGE" "schedule" "150" "199"
1708122488.686830 [0 lua] "ZRANGE" "schedule" "200" "249"
... 1,991 lines omitted
1708122488.824228 [0 lua] "ZRANGE" "schedule" "99800" "99849"
1708122488.824349 [0 lua] "ZRANGE" "schedule" "99850" "99899"
1708122488.824406 [0 lua] "ZRANGE" "schedule" "99900" "99949"
1708122488.824463 [0 lua] "ZRANGE" "schedule" "99950" "99999"
1708122488.824502 [0 lua] "ZRANGE" "schedule" "100000" "100049"
1708122488.824523 [0 lua] "ZREM" "schedule" "{\"retry\":true,\"queue\":\"customqueue\",\"lock\":\"until_executing\",\"on_conflict\":\"replace\",\"class\":\"UniqueJobOnConflictReplace\",\"args\":[100000,{\"type\":\"extremely unique\"}],\"jid\":\"4f2abef9a3c66954b92499c8\",\"created_at\":1708122488.6721134,\"lock_timeout\":0,\"lock_ttl\":null,\"lock_prefix\":\"uniquejobs\",\"lock_args\":[100000,{\"type\":\"extremely unique\"}],\"lock_digest\":\"uniquejobs:6a7e9a8bcee1870891c2e9b633fb4f86\"}

To this:

1708122430.856488 [0 lua] "ZRANGE" "schedule" "1710714430.8422594" "+inf" "BYSCORE" "LIMIT" "0" "50"
1708122430.856549 [0 lua] "ZREM" "schedule" "{\"retry\":true,\"queue\":\"customqueue\",\"lock\":\"until_executing\",\"on_conflict\":\"replace\",\"class\":\"UniqueJobOnConflictReplace\",\"args\":[100000,{\"type\":\"extremely unique\"}],\"jid\":\"49d4e63dcb6d767c17af0470\",\"created_at\":1708122430.8423865,\"lock_timeout\":0,\"lock_ttl\":null,\"lock_prefix\":\"uniquejobs\",\"lock_args\":[100000,{\"type\":\"extremely unique\"}],\"lock_digest\":\"uniquejobs:6a7e9a8bcee1870891c2e9b633fb4f86\"}"

@ezekg ezekg force-pushed the feature/add-score-to-unique-jobs-for-sorted-sets branch 4 times, most recently from 851a8f5 to bed17eb Compare February 17, 2024 03:25
@mhenrixon
Copy link
Owner

mhenrixon commented Feb 17, 2024

I really appreciate you taking a stab at this @ezekg! You spotted a couple of bugs!

  1. Unnecessary looping due to missing break.
  2. zrange is super slow in this case; we should be using zscan, of course.

Can you check if the linked PR fixes your problem, too? I prefer a simpler fix than adding these extra keys. If it still isn't good enough, I'm happy to pair on sorting this out!

@ezekg ezekg force-pushed the feature/add-score-to-unique-jobs-for-sorted-sets branch from bed17eb to 203f087 Compare February 17, 2024 20:32
@mhenrixon
Copy link
Owner

If you have a look here: https://github.com/mhenrixon/sidekiq-unique-jobs/blob/main/lib/sidekiq_unique_jobs/lua/lock.lua#L66 we already add a score for the digest, we also store the timestamp in he hash with digest + job_id (this is to be able to allow concurrent jobs of a specified number): https://github.com/mhenrixon/sidekiq-unique-jobs/blob/main/lib/sidekiq_unique_jobs/lua/lock.lua#L70

Between those two, it is a hard sell to add more timestamps. I am more inclined to attempt to reduce the number of commands than to increase them.

Could you find a way to make do with what is there or do a great job at selling the extra key to me @ezekg? I want to help you, I really do!

@ezekg ezekg force-pushed the feature/add-score-to-unique-jobs-for-sorted-sets branch from 203f087 to 13d562f Compare February 21, 2024 15:19
@ezekg ezekg force-pushed the feature/add-score-to-unique-jobs-for-sorted-sets branch from 13d562f to f53f91c Compare February 21, 2024 16:44
@ezekg
Copy link
Contributor Author

ezekg commented Feb 21, 2024

@mhenrixon the current implementation does not store the actual score of the job in the sorted set (i.e. the timestamp at which a scheduled job is scheduled to run at), but rather the current time at which the job is added to the sorted set. I updated the implementation to now store the job's score when available. I needed this to match the job's score in Sidekiq's schedule sorted set, otherwise it's useless in performing a divide-and-conquer search.

Let me know when you have time to review the new approach. I don't know if the previous timestamp was actually used for anything, but it didn't look like it at first glance.

@ezekg ezekg force-pushed the feature/add-score-to-unique-jobs-for-sorted-sets branch 3 times, most recently from 2548e5c to 0fb19b6 Compare February 21, 2024 17:25
@@ -62,8 +63,16 @@ if lock_type == "until_expired" and pttl and pttl > 0 then
log_debug("ZADD", expiring_digests, current_time + pttl, digest)
redis.call("ZADD", expiring_digests, current_time + pttl, digest)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left this alone because I don't fully understand what the expiring digests set is used for and if this optimization would be applicable to jobs utilizing an until_expired lock strategy.

Copy link
Owner

@mhenrixon mhenrixon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exactly what I had in mind. Sorry that I haven't been able to get to it.

Your assumption about the job score in the hash is correct, I just wanted something in the hash and wasn't clear with how to use it. It was one of those: "I believe I will need this".

As soon as the German IRS is off my back (they want money I don't have), I'll see about greatly optimizing the gem.

I never got around to looking at the performance.

Perhaps this would be better solved in ruby (like the reaper).

I'm definitely not opposed to use some batching from the ruby layer if there are more than n number in a sorted set.

I believe there are plenty to optimize and for the performance tests I need to remember that my machine is as fast as a laptop comes.

Not fair to compare locally, should probably write a bunch of these performance tests and have them run on GitHub actions.

@ezekg ezekg marked this pull request as ready for review February 21, 2024 18:33
@ezekg
Copy link
Contributor Author

ezekg commented Feb 21, 2024

Do you want me to loop in the changes from #837? That had improvements to delete_from_queue as well, which I didn't touch here. Let me know if you want me to cherry-pick, or if you want to do that separately.

@mhenrixon
Copy link
Owner

Do you want me to loop in the changes from #837?

That would be lovely, I forgot about the queue! I'm too exhausted to cut a release and test this tonight but I will do it first thing tomorrow morning.

Much appreciated!!

@ezekg ezekg force-pushed the feature/add-score-to-unique-jobs-for-sorted-sets branch 2 times, most recently from 11f17a8 to 8131bed Compare February 21, 2024 22:30
@ezekg
Copy link
Contributor Author

ezekg commented Feb 21, 2024

Done. Let me know what you find whenever you have a chance to test.

@ezekg ezekg force-pushed the feature/add-score-to-unique-jobs-for-sorted-sets branch from 8131bed to 5da7d08 Compare February 21, 2024 22:39
@mhenrixon mhenrixon enabled auto-merge (squash) February 22, 2024 07:14
@mhenrixon mhenrixon merged commit 1bfba2f into mhenrixon:main Feb 22, 2024
18 checks passed
@mhenrixon
Copy link
Owner

Looks like it will do the job just fine! Can't wait to optimize everything!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Slow evalsha causing timeouts
2 participants