Fully non-blocking distributed bucket #326

muldrik · 2022-11-08T09:33:12Z

muldrik
Nov 8, 2022

Hello, @vladimir-bukhtoyarov! I like the library a lot, however I can't figure out whether a few scenarios can be achieved with current features. If not, I would be grateful for any guidance to how I should implement these on top of existing source code without getting intimidated. I have been battling these for weeks, and I really tried my best before posting this.

Q1:

I have a distributed system and I have chosen Ignite as my cache cluster (I suppose not relevant, could be any other). There are no sticky-sessions, and the load isn't always evenly distributed

This is what I want the workflow to be like:

A tryConsume() or tryConsumeAndReturnRemaining() request comes in.
It should be always served locally
If at least N milliseconds had passed since the last synchronization of this bucket, then make an async request to update the value (batching all local changes that have piled up). The request is still served instantly (locally), but when the async part completes, the bucket silently updates.

I have found Optimization.delaying, which does almost exactly that. However, if I understand it correctly, there is a huge difference - it returns a CompletableFuture, and if the bucket sees that it is time to synchronize, the future won't be completed for the entire synchronization time.

I cannot afford these kinds of delays, and I want the actual cluster update to be issued behind the scenes (and the local bucket be updated as it completes), but the local value to be always instantly returned from tryConsume().

I wanted to try something like this:

        var asyncResult = bucket.tryConsume(tokens);
        asyncResult.completeOnTimeout(true, 2, TimeUnit.MILLISECONDS);
        boolean res= asyncResult.join();

However, I believe that CompletableFuture cannot be completed multiple times, and if the future actually completes on timeout, then the synchronization part (that arrives later) may be processed incorrectly.

Moreover, this solution implies that I always return true if the cache cluster is unreachable, which is way worse than continuing to serve locally.

Note that if the bucket issues such a behind-the-scene-update at point T1, and it completes at T2, then all local changes between T1 and T2 should be included in the update.

I really want these strong time guarantees, and I would appreciate any advice.

Q2:

Suppose I have 3 nodes with delaying optimization, and a distributed bucket with capacity 10. All 3 get 5 requests in quick succession, successfully serve them locally (tryConsume(1), for example), and then begin to sync. Will the final value of tokens be 0 or -5 (negative 5)?

The latter is sometimes desirable for me, this would require all the tryConsume-kind of request to be sent for synchronization as force-consume. Is this doable with simple source-code modification?

Alternatively, maybe I'm solving the wrong problem. What really concerns me, is a following scenario:

10 nodes have a bucket with 50ms sync period (as in Q1, or a delaying optimization)
the bucket refills 1 token every 50ms with capacity 100
every 50ms somebody manages to send 1 request to each of 10 servers. All servers accept, then sync, realize they spend 10 tokens out of 1 available, but set in to 0 instead of -9. Then the attacker repeats, essentially breaking the whole point of delayed sync.

A workaround that looks decent is to refill such buckets rarely and instantly to full capacity. That way, if the sync works decently-fast, the attacker wouldn't be able to spend all 1000 tokens before the sync arrives and cuts him off.

I believe that the solution I described in the beginning would be perfect, and I would appreciate both an advice to implement it, or an alternative to help with the underlying problem.

Q3:

The documentation states that AsyncBucketProxy is not a cheap object when created with optimizations (quite obviously), therefore I cache them for each remote bucket. However, sometimes I will have to update the configuration on my server nodes, not necessarily exactly at the same time.

What happens if two nodes hold AsyncBucketProxy with different configurations and try to synchronize them via the cache cluster? I know there can be certain merging strategies for distributed caches, but maybe the library already takes care of it nicely and I don't have to overthink.

Answered by vladimir-bukhtoyarov

Nov 8, 2022

@muldrik hello,

A1:

This is what I want the workflow to be like:
A tryConsume() or tryConsumeAndReturnRemaining() request comes in.
It should be always served locally
If at least N milliseconds had passed since the last synchronization of this bucket, then make an async request to update the value (batching all local changes that have piled up). The request is still served instantly (locally), but when the async part completes, the bucket silently updates.

Currently there are no optimizations that allows to operate fully asynchronously. All currently implemented optimizations are trying to do remote requests lesser, but when conditions obey that sync need to be done, the sync always per…

View full answer

vladimir-bukhtoyarov · 2022-11-08T13:47:04Z

vladimir-bukhtoyarov
Nov 8, 2022
Maintainer

@muldrik hello,

A1:

This is what I want the workflow to be like:
A tryConsume() or tryConsumeAndReturnRemaining() request comes in.
It should be always served locally
If at least N milliseconds had passed since the last synchronization of this bucket, then make an async request to update the value (batching all local changes that have piled up). The request is still served instantly (locally), but when the async part completes, the bucket silently updates.

Currently there are no optimizations that allows to operate fully asynchronously. All currently implemented optimizations are trying to do remote requests lesser, but when conditions obey that sync need to be done, the sync always performed in scope of the user thread.

I would be grateful for any guidance to how I should implement these on top of existing source code without getting intimidated.

Optimization is the interface, feel free to implement own. I would recomend to start from https://github.com/bucket4j/bucket4j/blob/master/bucket4j-core/src/main/java/io/github/bucket4j/distributed/proxy/optimization/delay/DelayOptimization.java, it already inheret from BatchingOptimization so you have guarantee that your code will be executed from one thread.

I can create mock-up for you, all that you will need - implement background call in some executor(and testing), it should not take more then 1 hour from me.

A2:

Suppose I have 3 nodes with delaying optimization, and a distributed bucket with capacity 10. All 3 get 5 requests in quick succession, successfully serve them locally (tryConsume(1), for example), and then begin to sync. Will the final value of tokens be 0 or -5 (negative 5)?

Yes, available tokens will became negative on all client nodes(as well as on server nodes) after synchronization. Negative amount of available tokens is normal case for Bucket4j math model, moreover I can say it is killer feature, it was initially introduced to support BlockingBucket in order to protect parked thread from situations when another threads stole requested tokens while it was parked in waiting for refill deficit.

A3:

The documentation states that AsyncBucketProxy is not a cheap object when created with optimizations (quite obviously), therefore I cache them for each remote bucket. However, sometimes I will have to update the configuration on my server nodes, not necessarily exactly at the same time.

It should not be created on per requests basis, because optimizations do gropping of requests on particular bucket instance. Really optimized bucket is not so costly, I suppose around 200 bytes, you can estimate it by investigation of heapdump, but as described above create->call->forget it is useless strategy for optimized buckets.

What happens if two nodes hold AsyncBucketProxy with different configurations and try to synchronize them via the cache cluster? I know there can be certain merging strategies for distributed caches, but maybe the library already takes care of it nicely and I don't have to overthink.

Bucket4j use unique architecture solution for configuration conflict resolving. Mostly libraries stores configuration on client nodes when state of bucket is stored in the storage, this way can lead to unresolvable problems when client configuration is not compatible with persisted state. In opposite to mainstream way, Bucket4j stores the both state of bucket and configuration in the storage together, so single call bucket.replaceConfiguration(newConfiguration) is fully enough to replace bucket configuration for whole cluster independently of cluster size. Of course this strategy adds more additional mental complecity like this #261 (comment) but I belive that our way more preferable, because we got full controll over configuration replacement process. You can read more by following links:

P.S about mockup based on DelayOptimization, I suppose that I will be able to provide it tomorrow.

6 replies

vladimir-bukhtoyarov Nov 9, 2022
Maintainer

I have prepared the solution that looks working. See this commit 51b5bd5 HazelcastWithManualSyningPerformanceExample is an usage example, pay attention that getAvailableTokens is called imediatelly after bucket construction, this provides gurarantee that cached state will not be null and all client interactions(excepting special like replaceConfiguration or reset) will be served locally.

The code bellow shows the way to sync bucket with remote storage

Executors.newScheduledThreadPool(1).scheduleAtFixedRate(() -> {
            bucket.getOptimizationController().syncImmediately();
            }, 1, 1, TimeUnit.SECONDS
        );

Try to use this code, I am awaiting feedback.

@muldrik according to 7 questions that you asked above. Mostly of them are unswered by my commit, if something still uncovered by 51b5bd5#diff-cf02584b5b35f8996835572e1afe2f95f7a642bfd2c03813f633d79081925f5c raise it againg

muldrik Nov 10, 2022
Author

Thanks a lot! I will try this out tomorrow and provide feedback at the end of the day!

vladimir-bukhtoyarov Nov 13, 2022
Maintainer

I have prepared the solution that looks working. See this commit 51b5bd5

Also, this code has been released with version 8.2.RC1

muldrik Nov 15, 2022
Author

@vladimir-bukhtoyarov Hi! Sorry for being so late, I was stuck with a bug and wanted to figure out whether it was just my fault. Turns out it was, and your code works great! Thank you very much for this addition.

By the way, you tagged the discussion in this commit 544d290. Could you please clarify what it does and in what scenario should I consider using the feature?

vladimir-bukhtoyarov Nov 15, 2022
Maintainer

Hi,

According to 544d290
ExucutionStrategy feature has been involved in order to fight with poorly protected clients/drivers like Apache Ignite, which under certain conditions(like crash by high-load or network-split) leave client threads blocked and client CompleatableFeature-s are never completed, so interacting with these technologies through executor allows to introduce timeouts. This way does not repair original problem, just allows to free client threads and complete client features in predicted timeouts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fully non-blocking distributed bucket #326

{{title}}

Replies: 1 comment 6 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Fully non-blocking distributed bucket #326

muldrik Nov 8, 2022

Replies: 1 comment · 6 replies

vladimir-bukhtoyarov Nov 8, 2022 Maintainer

vladimir-bukhtoyarov Nov 9, 2022 Maintainer

muldrik Nov 10, 2022 Author

vladimir-bukhtoyarov Nov 13, 2022 Maintainer

muldrik Nov 15, 2022 Author

vladimir-bukhtoyarov Nov 15, 2022 Maintainer

muldrik
Nov 8, 2022

Replies: 1 comment 6 replies

vladimir-bukhtoyarov
Nov 8, 2022
Maintainer

vladimir-bukhtoyarov Nov 9, 2022
Maintainer

muldrik Nov 10, 2022
Author

vladimir-bukhtoyarov Nov 13, 2022
Maintainer

muldrik Nov 15, 2022
Author

vladimir-bukhtoyarov Nov 15, 2022
Maintainer