Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation for transaction #299

Closed
supercaracal opened this issue Dec 11, 2023 · 16 comments
Closed

Implementation for transaction #299

supercaracal opened this issue Dec 11, 2023 · 16 comments
Assignees
Labels
bug Something isn't working enhancement New feature or request

Comments

@supercaracal
Copy link
Member

supercaracal commented Dec 11, 2023

This issue is just opened to purpose to put together related pull requests. It will be closed when they are merged or closed.

  • feat: support transaction #277
  • Fix MULTI transactions calling the block twice #294
  • Support watch & unwatch properly #295
  • Implement "pinning" cluster connection to a single node - RedisClient::Cluster#with #298
  • Considerations:
    • the determination of the node that the transaction should be sent to
      • The client can't determine the node until the first command included keys is passed.
    • the handling redirects duaring the resharding
      • The client should not redirect any errors caused by user side bugs.
    • the optimistic locking with the watch command
      • The watch command can watch multiple keys. However it should be called to a single node in a sharded cluster. So in practically, it allows only keys to a single slot.
    • the key validation
      • There is a discussion whether should do it proactive or reactive.
      • I think the client doesn't need the proactive validation if the server do the strict validation.
    • the compatibility between the redis-clustering gem
      • I think the redis-cluster-client gem should not be complex in excess due to the redis gem.
    • the design
      • We should keep the interface same as the redis-client as possible.
      • We shouldn't bring any complexity to the code base just for the feature. Once the code messed up, we'll be hard to maintain it.
        • Of cource all bugs should be fixed.
      • Our gem shouldn't affect in excess by the redis-clustering gem.
@supercaracal supercaracal self-assigned this Dec 11, 2023
@supercaracal supercaracal added enhancement New feature or request bug Something isn't working labels Dec 11, 2023
@KJTsanaktsidis
Copy link
Contributor

OK - I think I'm going to close all three of those PR's (each of which try and solve the whole implementation at once, in three different ways), and begin opening smaller PR's for some of the prereq work to get cluster transactions working slightly better.

@supercaracal supercaracal changed the title Implementation of transaction Implementation for transaction Dec 12, 2023
@KJTsanaktsidis
Copy link
Contributor

OK, next up, is a new test helper & some further refactoring of the error handling logic in the router:

(Thanks for looking at the previous two so promptly!)

@KJTsanaktsidis
Copy link
Contributor

I opened #306 as an alternative to handling blocking_v specially in response to your feedback.

@supercaracal
Copy link
Member Author

The above pull request has been merged. Thank you.

@KJTsanaktsidis
Copy link
Contributor

And for my next trick, now these two blocks of code are identical, so we can refactor this:

@KJTsanaktsidis
Copy link
Contributor

Two more PR's for your consideration @supercaracal :)

@KJTsanaktsidis
Copy link
Contributor

OK, couple more @supercaracal ! This is code you've mostly looked at before I think but I think they can now be merged in because all of the prerequisites are done.

@KJTsanaktsidis
Copy link
Contributor

This week's PR's @supercaracal

As always, thank you! 🙇

@KJTsanaktsidis
Copy link
Contributor

❤️ I see you've made some big changes to the transaction implementation over the weekend @supercaracal !

I'm going to close both of my open PR's then. I'm going to open a couple of new ones today hopefully to address a couple of small enhancements:

redis.watch("key") do
  if redis.get("key") == "some value"
    redis.multi do |multi|
      multi.set("key", "other value")
      multi.incr("counter")
    end
  else
    redis.unwatch
  end
end
  # => ["OK", 6]

I suspect if I can make that work, the #with approach might be redundant and we could delete it.

@supercaracal
Copy link
Member Author

Thank you for your feedback and reviewing. As you said, I think there are some bugs yet. I wish it could be enhanced by all of us.

@KJTsanaktsidis
Copy link
Contributor

Nice, you fixed most of ☝️ before I managed to 😁

I opened another PR which fixes a couple more issues with retries around watches: #338

I also saw you started work on exposing an interface to #watch which redis-rb can use - Is this something you would like me to help you with? Or would you prefer to get a version working and then I can test it out in our environment & fix any bugs I find?

@supercaracal
Copy link
Member Author

That would be helpful, thank you!

Unfortunately, I don't use this gem in my production environment I'm working. I feel there are some bugs yet.

@KJTsanaktsidis
Copy link
Contributor

@supercaracal We performed some pretty extensive testing today at Zendesk with redis-cluster-client on our pre-production environment, and in particular checked how transaction support behaved with Elasticache scale in and scale out operations. I'm pleased to report it was a big success!

We tested deploying our app with:

We found:

  • Under our current production conditions, with a redis "cluster" with a single shard, our transactions (both MULTI ... EXEC transactions, and WATCH ... MULTI ... EXEC transactions) as made through redis-rb all worked correctly.
  • We then scaled out the cluster to have five shards. Everything worked as expected:
    • Non-transaction cluster-mode operations seamlessly handled the redirections as expected
    • In theory we could have seen a brief interruption of our WATCH ... MULTI ... EXEC transactions, but actually the single retry in here was enough to hide all of the errors.
  • We then saw a long tail of "errors" being reported to Datadog (our APM provider) from the underlying redis-client connections as the cluster-client updated its slot maps in response to MOVED responses, but these were completely handled by redis-cluster-client and so the application never saw them. Again, this is working as expected!
  • Then, we scaled the cluster back in to one node. This caused up to a minute or so of RedisClient::NodeMightBeDown errors to be raised because there's a period of time where Elasticache's response to CLUSTER SLOTS no longer covers all of the slots (this is quite annoying, but it's an AWS problem - redis-cluster-client handled this correctly).
  • Once the slots were all served by Elasticache, everything became functional again
  • There was, again, a long tail of errors from the underlying redis-client connections being reported to Datadog; this time, they were RedisClient::ConnectionError errors, caused by cluster-clients attempting to use connections that had been disconnected on the AWS side. redis-cluster-client correctly caught these exceptions, and re-queried the cluster topology.

There was one bug which I fixed. When catching a RedisClient::ConnectionError in the Router here, we call update_cluster_info! to requery the topology (since a server we knew about went away). However, we then perform a retry on the same node object that raised the ConnectionError, not taking into account the new topology. We actually need to perform find_node there again before retrying.

I fixed that in these two commits:

I'll open PR's for these after #338 and #339 are merged though so I don't make more conflicts.


To summarise, everything worked great with both WATCH and non-watch transactions in the presence of scale-outs and scale-ins! Thanks so much for your work on this gem and for working with me to improve the transaction support.

@supercaracal
Copy link
Member Author

I appreciate that alot. I'll check it later as soon as possible.

@KJTsanaktsidis
Copy link
Contributor

Just wanted to report back here that we deployed the fork above to production as well last week and it's been going well (in fact we got a nice little latency dip because of the general performance improvements in redis-client vs redis-rb v4 I think).

Would love it if #340 could be merged soon so we can avoid using a fork of the gem!

@supercaracal
Copy link
Member Author

Bugs were fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants