Implementation for transaction #299

supercaracal · 2023-12-11T11:10:16Z

This issue is just opened to purpose to put together related pull requests. It will be closed when they are merged or closed.

feat: support transaction #277
Fix MULTI transactions calling the block twice #294
Support watch & unwatch properly #295
Implement "pinning" cluster connection to a single node - RedisClient::Cluster#with #298
Considerations:
- the determination of the node that the transaction should be sent to
  - The client can't determine the node until the first command included keys is passed.
- the handling redirects duaring the resharding
  - The client should not redirect any errors caused by user side bugs.
- the optimistic locking with the watch command
  - The watch command can watch multiple keys. However it should be called to a single node in a sharded cluster. So in practically, it allows only keys to a single slot.
- the key validation
  - There is a discussion whether should do it proactive or reactive.
  - I think the client doesn't need the proactive validation if the server do the strict validation.
- the compatibility between the redis-clustering gem
  - I think the redis-cluster-client gem should not be complex in excess due to the redis gem.
- the design
  - We should keep the interface same as the redis-client as possible.
  - We shouldn't bring any complexity to the code base just for the feature. Once the code messed up, we'll be hard to maintain it.
    - Of cource all bugs should be fixed.
  - Our gem shouldn't affect in excess by the redis-clustering gem.

The text was updated successfully, but these errors were encountered:

KJTsanaktsidis · 2023-12-12T02:33:59Z

OK - I think I'm going to close all three of those PR's (each of which try and solve the whole implementation at once, in three different ways), and begin opening smaller PR's for some of the prereq work to get cluster transactions working slightly better.

KJTsanaktsidis · 2023-12-13T05:43:03Z

OK, next up, is a new test helper & some further refactoring of the error handling logic in the router:

(Thanks for looking at the previous two so promptly!)

KJTsanaktsidis · 2023-12-14T03:36:21Z

I opened #306 as an alternative to handling blocking_v specially in response to your feedback.

supercaracal · 2023-12-14T05:25:19Z

Delete special handling of blocking command timeouts #306

The above pull request has been merged. Thank you.

KJTsanaktsidis · 2023-12-15T06:44:01Z

And for my next trick, now these two blocks of code are identical, so we can refactor this:

Refactor redirection error handling into one block #307

KJTsanaktsidis · 2024-01-08T05:59:22Z

Two more PR's for your consideration @supercaracal :)

KJTsanaktsidis · 2024-01-16T04:44:41Z

OK, couple more @supercaracal ! This is code you've mostly looked at before I think but I think they can now be merged in because all of the prerequisites are done.

Implement RedisClient::Cluster#with (again!) #311 - This implements #with again, without any kind of wrapping. The raw RedisClient instance is returned to the caller, which means the whole thing is now about 5 lines.
Implement RedisClient::Cluster::Command#extract_all_keys #312 - this implements an internal extract_all_keys method for getting all the keys out of a command. I want this for performing ahead-of-time checks on keys passed in to pinned clients, but it's probably a useful thing to have in the gem anyway.

KJTsanaktsidis · 2024-01-23T06:27:41Z

This week's PR's @supercaracal

Fix error identification middleware calling into the wrong super #315 - Fix a simple bug from the error identification PR previusly
Validate key slots used with RedisCluster::Client#with #314 - More complicated: implements raising TransactionConsistencyError when mixing up slots.

As always, thank you! 🙇

KJTsanaktsidis · 2024-02-19T01:13:26Z

❤️ I see you've made some big changes to the transaction implementation over the weekend @supercaracal !

I'm going to close both of my open PR's then. I'm going to open a couple of new ones today hopefully to address a couple of small enhancements:

I think the call to UNWATCH here:

redis-cluster-client/lib/redis_client/cluster/optimistic_locking.rb

Line 22 in 966ae6c

c.call('UNWATCH')

needs to be a little different. We need to not call UNWATCH if we called EXEC/DISCARD in the block, and we need to call UNWATCH if the block raised and we did not call EXEC/DISCARD. It needs to look something like this in redis-rb I think: https://github.com/redis/redis-rb/blob/349ddb1daeed0ef1ee6e8a521238705ced010f27/lib/redis/commands/transactions.rb#L61-L77.
I want to expose a #watch interface that redis-rb can use so that existing code which uses redis-rb and does something like this (from the redis-rb docs) can be compatible with cluster mode

redis.watch("key") do
  if redis.get("key") == "some value"
    redis.multi do |multi|
      multi.set("key", "other value")
      multi.incr("counter")
    end
  else
    redis.unwatch
  end
end
  # => ["OK", 6]

I suspect if I can make that work, the #with approach might be redundant and we could delete it.

supercaracal · 2024-02-19T02:14:33Z

Thank you for your feedback and reviewing. As you said, I think there are some bugs yet. I wish it could be enhanced by all of us.

KJTsanaktsidis · 2024-02-20T08:23:33Z

Nice, you fixed most of ☝️ before I managed to 😁

I opened another PR which fixes a couple more issues with retries around watches: #338

I also saw you started work on exposing an interface to #watch which redis-rb can use - Is this something you would like me to help you with? Or would you prefer to get a version working and then I can test it out in our environment & fix any bugs I find?

supercaracal · 2024-02-20T08:30:05Z

That would be helpful, thank you!

Unfortunately, I don't use this gem in my production environment I'm working. I feel there are some bugs yet.

KJTsanaktsidis · 2024-03-05T07:32:34Z

@supercaracal We performed some pretty extensive testing today at Zendesk with redis-cluster-client on our pre-production environment, and in particular checked how transaction support behaved with Elasticache scale in and scale out operations. I'm pleased to report it was a big success!

We tested deploying our app with:

redis-client v0.21.0
redis-cluster-client from this commit: https://github.com/zendesk/redis-cluster-client/tree/e267c9e079ecf7cd47688bb7572ba67607a40112. That contains Fix two separate but related problems with watch retry handling #338 and Add an explicit #watch method to RedisClient::Cluster #339, and also two other fixes I'll talk about in a bit
redis-rb & redis-clustering from this commit: https://github.com/zendesk/redis-rb/tree/5bdb9e8c55cc4cfbd311eda9f12ab0505954ccd9. That contains Implement #watch and #multi specially for cluster-client redis/redis-rb#1256.

We found:

Under our current production conditions, with a redis "cluster" with a single shard, our transactions (both MULTI ... EXEC transactions, and WATCH ... MULTI ... EXEC transactions) as made through redis-rb all worked correctly.
We then scaled out the cluster to have five shards. Everything worked as expected:
- Non-transaction cluster-mode operations seamlessly handled the redirections as expected
- In theory we could have seen a brief interruption of our WATCH ... MULTI ... EXEC transactions, but actually the single retry in here was enough to hide all of the errors.
We then saw a long tail of "errors" being reported to Datadog (our APM provider) from the underlying redis-client connections as the cluster-client updated its slot maps in response to MOVED responses, but these were completely handled by redis-cluster-client and so the application never saw them. Again, this is working as expected!
Then, we scaled the cluster back in to one node. This caused up to a minute or so of RedisClient::NodeMightBeDown errors to be raised because there's a period of time where Elasticache's response to CLUSTER SLOTS no longer covers all of the slots (this is quite annoying, but it's an AWS problem - redis-cluster-client handled this correctly).
Once the slots were all served by Elasticache, everything became functional again
There was, again, a long tail of errors from the underlying redis-client connections being reported to Datadog; this time, they were RedisClient::ConnectionError errors, caused by cluster-clients attempting to use connections that had been disconnected on the AWS side. redis-cluster-client correctly caught these exceptions, and re-queried the cluster topology.

There was one bug which I fixed. When catching a RedisClient::ConnectionError in the Router here, we call update_cluster_info! to requery the topology (since a server we knew about went away). However, we then perform a retry on the same node object that raised the ConnectionError, not taking into account the new topology. We actually need to perform find_node there again before retrying.

I fixed that in these two commits:

I'll open PR's for these after #338 and #339 are merged though so I don't make more conflicts.

To summarise, everything worked great with both WATCH and non-watch transactions in the presence of scale-outs and scale-ins! Thanks so much for your work on this gem and for working with me to improve the transaction support.

supercaracal · 2024-03-06T00:17:22Z

I appreciate that alot. I'll check it later as soon as possible.

KJTsanaktsidis · 2024-04-01T22:15:54Z

Just wanted to report back here that we deployed the fork above to production as well last week and it's been going well (in fact we got a nice little latency dip because of the general performance improvements in redis-client vs redis-rb v4 I think).

Would love it if #340 could be merged soon so we can avoid using a fork of the gem!

supercaracal · 2024-04-16T09:52:02Z

Bugs were fixed.

supercaracal self-assigned this Dec 11, 2023

supercaracal added enhancement New feature or request bug Something isn't working labels Dec 11, 2023

supercaracal changed the title ~~Implementation of transaction~~ Implementation for transaction Dec 12, 2023

KJTsanaktsidis mentioned this issue Jan 12, 2024

v1.6.0 no longer works with Redis cluster rails/kredis#131

Closed

supercaracal closed this as completed Apr 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation for transaction #299

Implementation for transaction #299

supercaracal commented Dec 11, 2023 •

edited

KJTsanaktsidis commented Dec 12, 2023

KJTsanaktsidis commented Dec 13, 2023

KJTsanaktsidis commented Dec 14, 2023

supercaracal commented Dec 14, 2023

KJTsanaktsidis commented Dec 15, 2023

KJTsanaktsidis commented Jan 8, 2024

KJTsanaktsidis commented Jan 16, 2024

KJTsanaktsidis commented Jan 23, 2024

KJTsanaktsidis commented Feb 19, 2024

supercaracal commented Feb 19, 2024

KJTsanaktsidis commented Feb 20, 2024

supercaracal commented Feb 20, 2024

KJTsanaktsidis commented Mar 5, 2024

supercaracal commented Mar 6, 2024

KJTsanaktsidis commented Apr 1, 2024

supercaracal commented Apr 16, 2024

Implementation for transaction #299

Implementation for transaction #299

Comments

supercaracal commented Dec 11, 2023 • edited

KJTsanaktsidis commented Dec 12, 2023

KJTsanaktsidis commented Dec 13, 2023

KJTsanaktsidis commented Dec 14, 2023

supercaracal commented Dec 14, 2023

KJTsanaktsidis commented Dec 15, 2023

KJTsanaktsidis commented Jan 8, 2024

KJTsanaktsidis commented Jan 16, 2024

KJTsanaktsidis commented Jan 23, 2024

KJTsanaktsidis commented Feb 19, 2024

supercaracal commented Feb 19, 2024

KJTsanaktsidis commented Feb 20, 2024

supercaracal commented Feb 20, 2024

KJTsanaktsidis commented Mar 5, 2024

supercaracal commented Mar 6, 2024

KJTsanaktsidis commented Apr 1, 2024

supercaracal commented Apr 16, 2024

supercaracal commented Dec 11, 2023 •

edited