tryConsumeAndReturnRemainingnis taking long time #479

ddrzymala · 2024-04-22T20:21:20Z

Hi, we are seeing long execution time when running tryConsumeAndReturnRemainingnis method on high load.
We are planning to run up to 10000 request per second with scaling. At the moment it looks like we cant get above 1000 as something is limiting us. After investigation we narrowed it down to tryConsumeAndReturnRemainingnis method or Redis cluster setup or both. We use AWS elasticCache redis service.
Basically we have warpped tryConsumeAndReturnRemainingnis in isAllowed method and for investigation needs we have hardcoded proxyBucket to slim down problem. Please find below snippet of our code:

public Mono isAllowed(String routeId, String id) {
AsyncBucketProxy bucket;
try {

        bucket = buckets.asAsync().builder()
                .build(id, () -> CompletableFuture.supplyAsync(() -> {
                    Bandwidth bandwidth = Bandwidth.builder()
                            .capacity(5000)
                            .refillIntervally(5000, Duration.ofSeconds(1))
                            .build();
                    return BucketConfiguration.builder()
                            .addLimit(bandwidth)
                            .build();

                }));
    }
    catch (Throwable e) {
        log.error("There was and error when initialising bucket.", e);
        throw new BucketExceptions.BucketExecutionException("There was and error when initialising bucket.");
    }
    return consumeToken(bucket)
            .name("isAllowed.method.call")
            .tap(Micrometer.observation(registry));
}

public Mono<Response> consumeToken(AsyncBucketProxy bucket) {
    return Mono.fromFuture(bucket.tryConsumeAndReturnRemaining(NUM_TOKENS_CONSUMED))
            .map(p -> new Response(p.getRemainingTokens() > 0,
                    Map.of(X_RATE_LIMIT_REMAINING_HEADER, Long.toString(p.getRemainingTokens()))));
}

We have extensive metrics to track everything and it looks like isAllowed method can take up to 40 second. When I remove tryConsumeAndReturnRemaining method and hardcode the response, so basically we do not talk to redis via that method all is fast.
We just not sure what can cause that slowness, we could setup something wrong - below is our Redis and ProxyManger setup:

@configuration
@AllArgsConstructor
@slf4j
public class RedisConfig {

private ApplicationConfig applicationConfig;

@Bean(destroyMethod = "shutdown")
RedisClusterClient redisClient(MeterRegistry meterRegistry) {

    MicrometerOptions options = MicrometerOptions.builder()
            .enable()
            .histogram(true)

            .build();
    ClientResources resources = ClientResources.builder().commandLatencyRecorder(new MicrometerCommandLatencyRecorder(meterRegistry, options)).build();

    RedisURI uri = RedisURI.Builder.redis("redis-cluster-url", 6379)
            .withSsl(true)
            .withVerifyPeer(false)
            .withDatabase(0)
            .build();

    RedisClusterClient client =  RedisClusterClient.create(resources, uri);
    client.setOptions(ClusterClientOptions.builder().build());

    return client;
}

@Bean
public RedisTemplate<String, String> redisTemplate(LettuceConnectionFactory redisConnectionFactory) {
    RedisTemplate <String, String> redisTemplate = new StringRedisTemplate();
    redisTemplate.setConnectionFactory(redisConnectionFactory);
    return redisTemplate;
}

/**
 * TTL should match roughly the time window set for rate limit.
 */
@Bean
LettuceBasedProxyManager<String> proxyManager(RedisClusterClient redisClusterClient) {

    ClientSideConfig clientSideConfig = ClientSideConfig.getDefault();

    StatefulRedisClusterConnection<String, byte[]> redisConnection = redisClusterClient
            .connect(RedisCodec.of(StringCodec.UTF8, ByteArrayCodec.INSTANCE));

    redisConnection.setReadFrom(ReadFrom.ANY_REPLICA);

    return LettuceBasedProxyManager.builderFor(redisConnection)
            .withExpirationStrategy(ExpirationAfterWriteStrategy.basedOnTimeForRefillingBucketUpToMax(Duration.ofSeconds(5)))

            .withClientSideConfig(clientSideConfig)
            .build();
}

We use latest library: bucket4j-redis: 8.10.1 and we Lettuce java client (we tried Redisson and experienced similar issue)
Bucket4j is wired in with Spring Cloud Gateway.
We also tried used other method from that class - tryConsume() with same result.
If we run up to 400 request per second on single instance it seems to be fine. It looks like we cannot scale up so when we run 4 instances we cannot get more the 1000 request per second and we can see described slowness up to 40 second in isAllowed method which basically just run tryConsumeAndReturnRemaining and some object construction.
If someone can point us in right direction that would be great as we could do something wrong or misconfigured something.
We can provide more metrics on request. Thank you.

The text was updated successfully, but these errors were encountered:

vladimir-bukhtoyarov · 2024-04-22T20:43:17Z

It is known issue. Compare and swap does not scale well at distributed environment. that is why this issue #410 is planned to be implemented.

Hovewer things are not so bad as it looks. If you have a few keys that are heavelly loaded, and these keys are known at build time, that various optimizations can be applied. This is feature that is not documented yet, I am going to do some refactoring and document it in the scope of #415
In my opinion optimizations is mature feature, and you can utilize it. You can read several discussions to get more info:

Also, there are some examples https://github.com/bucket4j/bucket4j/tree/8.10/bucket4j-examples/src/test/java/example/distributed/optimizers that help to understand how to configure more optimal interactions with persistent.

There is only one rule that you must not forget https://github.com/bucket4j/bucket4j/blob/8.10/bucket4j-core/src/main/java/io/github/bucket4j/distributed/proxy/RemoteAsyncBucketBuilder.java#L53

vladimir-bukhtoyarov · 2024-04-23T06:00:59Z

Bucket4j is wired in with Spring Cloud Gateway.

If I understand correctly this doc https://docs.spring.io/spring-cloud-gateway/reference/spring-cloud-gateway-server-mvc/filters/ratelimiter.html you need to provide AsyncProxyManager bean. If you describe your use-case better I will be able to provide something that will be scalable for your case, via subclassing(or decoration) existed proxy-manager implementations, my idea is to caching of highloaded buckets with applied optimizers.

ddrzymala · 2024-04-23T15:02:41Z

Thanks you for prompt reply. Yes you are correct when you are pointing to spring cloud gateway (SCG) docs. We use SCG to filter URLs and apply rate limit if URL matches given criteria. We have custom rate limiter class that extend AbstractRateLimiter. We had to do that as we have different rate limit per customer. We get customer id from the URL and that ID is used to find customer limits and then construct the bucket. Customer limits are also stored in Redis but then cached as they are not changing that often.

We tried your suggestion and tried optimisation with one proxy bucket that is reused - being hardcoded and class scope. After running with that and applied batch optimisation is seems to be working and scaling match better. Thanks for pointing that rule about cheap/not cheap object as this is where I want wrong.
Problem is that i'm not sure what is best practice to re-use AsyncBucketProxy so it is not created each time and batching optimisation can be applied.

At the moment I'm trying with Caffeine caching but I'm not sure this is best approach as it is local cache and we will have few instances running.

`public Mono isAllowed(String routeId, String id) {

    AsyncBucketProxy bucket = bucketProxyService.cachedBucket(id, routeId);
    if (bucket == null) {
        log.error("There was and error when initialising bucket.");
        throw new BucketExceptions.BucketExecutionException("There was and error when initialising bucket.");
    }
    return consumeToken(bucket)
            .name("isAllowed.method.call")
            .tap(Micrometer.observation(registry));
}`

Please could you tell me if that would be a problem or suggest other way to store that object (AsyncBucketProxy bucket = bucketProxyService.cachedBucket(id, routeId);). Also what is trade off with using batch optimisation - precision?

vladimir-bukhtoyarov · 2024-04-23T16:05:47Z

The ideal way will be wait for #415, changes that proposed in the scope of this issue will help to eliminate needs to cache something. Hovewer, this ussue does not have clear ETA, maybe 1 month, maybe two months, it depends from many factor, including busying on my regular work. I suppose that this is unactable for you.

I think that, in general Caffeine is right choise. But if you plan to use only batching, and do not plan to use Delaying or Predicting optimization, then ConcurrentHashMap will be enough in my opinion, store entry while there are concurrent requests, and remove entry if last request have been executed and there is no new request in same key.

ddrzymala · 2024-04-23T16:14:33Z

Thanks for replay. You are correct we cannot wait that long but good to know that this will improve in the future.
I was not aware you can combine optimisations. We just tried batching as first to see if it is making any difference. So how you would combine them - please could you point us to some example or docs. Thank you for great support - it was very helpful.

vladimir-bukhtoyarov · 2024-04-23T16:18:18Z

I was not aware you can combine optimisations.

In general, optimizations can not be combined. But good news - every other optimization is based on top of batching. So, when you choose delaying - the batching is already included behind the scene.

ddrzymala · 2024-04-26T09:10:11Z

We are getting promising results with optimisation - we can now comfortably scale up to 3000 request per second. We need to get to 5000. My question is if you can suggest from your experience what would be the best following values:

caching proxyBaskets as we discussed above - what would be sensible value? I have set it up 300 seconds now.
ttl for radis basket state? I have it setup as 10 second atm

We use Redis cluster setup on AWS elasticache. In proxyManger we have setup that:

`StatefulRedisClusterConnection<String, byte[]> redisConnection = redisClusterClient
.connect(RedisCodec.of(StringCodec.UTF8, ByteArrayCodec.INSTANCE));

    redisConnection.setReadFrom(ReadFrom.ANY_REPLICA);`

and

RedisClusterClient redisClusterClient(MeterRegistry meterRegistry) {

    MicrometerOptions options = MicrometerOptions.builder()
            .enable()
            .histogram(true)

            .build();
    ClientResources resources = ClientResources.builder()
            .commandLatencyRecorder(new MicrometerCommandLatencyRecorder(meterRegistry, options)).build();

    RedisURI uri = RedisURI.Builder.redis("xxxxxxxxxxx", 6379)
            .withSsl(true)
            .withVerifyPeer(false)
            .withDatabase(0)
            .build();

    RedisClusterClient client =  RedisClusterClient.create(resources, uri);
    client.setOptions(ClusterClientOptions.builder().build());

    return client;
}

Redis have one Shard and five Nodes.
Is there any setup we may add to improve stuff?
I know you do not have all context but you may know from your experience. Thank you

vladimir-bukhtoyarov · 2024-04-26T10:21:17Z

@ddrzymala

I have no expirience with Redis in production, my domain is limited by things like Apache Ignite, Orcacle Coherence and Hazelcast. To be clear, all current integrations for Redis were created just by reaction to user feature requests and its were implemented based on common scense. So fine tunning of Redis should be your personal exrecise.

ttl for radis basket state?

If you plan to use caching of bucket-proxies via Caffeine together with "delaying-optimization", then it is obvios that expiration-after-read on caffeine cache should be longer that synchronization max delay.

we can now comfortably scale up to 3000 request per second. We need to get to 5000.

It is not good numbers, I belive that it can be better. Did you decide to stay on batching? delaying optimization in my opinion should provide better numbers.

P.S.
I would recommend to configure optimization listener In order to observe how does effective optimization perform. Example of listener can be found there

ddrzymala · 2024-04-26T10:43:40Z

Thanks for the reply. Batching is our baseline as my understanding with delay or prediction (which also use delay) you will loose some precision. We can accept small lose there but working with batching first then we can add others to see difference.
It is encouraging that you think we can go higher then 5000 - we hit that last night but response time was not great so we are investigating it now.
Thanks.

vladimir-bukhtoyarov · 2024-04-26T10:46:58Z

@ddrzymala

predicting optimization should reduce the losses in accuracy, it should be suitable for mostly cases.

vladimir-bukhtoyarov added the question label Apr 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tryConsumeAndReturnRemainingnis taking long time #479

tryConsumeAndReturnRemainingnis taking long time #479

ddrzymala commented Apr 22, 2024

vladimir-bukhtoyarov commented Apr 22, 2024

vladimir-bukhtoyarov commented Apr 23, 2024

ddrzymala commented Apr 23, 2024

vladimir-bukhtoyarov commented Apr 23, 2024

ddrzymala commented Apr 23, 2024

vladimir-bukhtoyarov commented Apr 23, 2024 •

edited

ddrzymala commented Apr 26, 2024

vladimir-bukhtoyarov commented Apr 26, 2024 •

edited

ddrzymala commented Apr 26, 2024

vladimir-bukhtoyarov commented Apr 26, 2024

tryConsumeAndReturnRemainingnis taking long time #479

tryConsumeAndReturnRemainingnis taking long time #479

Comments

ddrzymala commented Apr 22, 2024

vladimir-bukhtoyarov commented Apr 22, 2024

vladimir-bukhtoyarov commented Apr 23, 2024

ddrzymala commented Apr 23, 2024

vladimir-bukhtoyarov commented Apr 23, 2024

ddrzymala commented Apr 23, 2024

vladimir-bukhtoyarov commented Apr 23, 2024 • edited

ddrzymala commented Apr 26, 2024

vladimir-bukhtoyarov commented Apr 26, 2024 • edited

ddrzymala commented Apr 26, 2024

vladimir-bukhtoyarov commented Apr 26, 2024

vladimir-bukhtoyarov commented Apr 23, 2024 •

edited

vladimir-bukhtoyarov commented Apr 26, 2024 •

edited