Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PreciseTopicPublishRateLimiterEnable doesn't always work. #10382

Closed
galrose opened this issue Apr 26, 2021 · 2 comments
Closed

PreciseTopicPublishRateLimiterEnable doesn't always work. #10382

galrose opened this issue Apr 26, 2021 · 2 comments
Labels
type/bug The PR fixed a bug or issue reported a bug

Comments

@galrose
Copy link
Contributor

galrose commented Apr 26, 2021

Describe the bug
When using the pulsar-perf and writing messages either in large batches (>100) or large number of outstanding messages (>100).
When limiting either the -bm or -o to a small number around 5, preferably 1 it works perfectly.
The easiest way to check is not setting the -bm or -o in pulsar-perf.

Slack thread: https://apache-pulsar.slack.com/archives/C5Z4T36F7/p1619419736156600?thread_ts=1619419736.156600&cid=C5Z4T36F7

To Reproduce
Steps to reproduce the behavior:

  1. preciseTopicPublishRateLimiterEnable=true in the broker.conf
  2. Create a new tenant, namespace, and topic.
  3. Run 'pulsar-admin namespaces set-publish-rate mytenant/default -b 102400'
  4. Run 'pulsar-perf produce -s 1024 -threads 1 -r 1000'
  5. You can see both in the metrics and in the pulsar-perf that the rate is not limited precisely.

Expected behavior
I expect the limit to be precise no matter the batch or outstanding messages.

Desktop (please complete the following information):

  • OS: Centos 7
@lhotari
Copy link
Member

lhotari commented Apr 26, 2021

@galrose thanks for reporting .

When reproducing, I needed to also specify a message limit to the publish rate limit.
For example,
pulsar-admin namespaces set-publish-rate mytenant/default -b 102400 -m 1000
That seems to be another bug which is perhaps the "open issue" in PR #10384 .

I was able to reproduce the issue. When the amount of outstanding messages is high the rate limiting is very inconsistent.

For example:

17:11:02.944 [main] INFO  org.apache.pulsar.testclient.PerformanceProducer - Throughput produced:    138.3  msg/s ---      1.1 Mbit/s --- failure      0.0 msg/s --- Latency: mean: 4596.594 ms - med: 4423.903 - 95pct: 7853.471 - 99pct: 7902.623 - 99.9pct: 7914.623 - 99.99pct: 7915.647 - Max: 7915.647
17:11:12.996 [main] INFO  org.apache.pulsar.testclient.PerformanceProducer - Throughput produced:    494.7  msg/s ---      3.9 Mbit/s --- failure      0.0 msg/s --- Latency: mean: 2469.594 ms - med: 1781.367 - 95pct: 5994.559 - 99pct: 6994.687 - 99.9pct: 6997.087 - 99.99pct: 7776.479 - Max: 7776.479
17:11:23.023 [main] INFO  org.apache.pulsar.testclient.PerformanceProducer - Throughput produced:    754.8  msg/s ---      5.9 Mbit/s --- failure      0.0 msg/s --- Latency: mean: 1304.984 ms - med: 1214.711 - 95pct: 2000.375 - 99pct: 2000.927 - 99.9pct: 2001.071 - 99.99pct: 2001.095 - Max: 2001.215
17:11:33.044 [main] INFO  org.apache.pulsar.testclient.PerformanceProducer - Throughput produced:    626.0  msg/s ---      4.9 Mbit/s --- failure      0.0 msg/s --- Latency: mean: 1658.938 ms - med: 1773.103 - 95pct: 2983.343 - 99pct: 3003.791 - 99.9pct: 3003.887 - 99.99pct: 3003.903 - Max: 3004.079
17:11:43.069 [main] INFO  org.apache.pulsar.testclient.PerformanceProducer - Throughput produced:    501.4  msg/s ---      3.9 Mbit/s --- failure      0.0 msg/s --- Latency: mean: 1930.409 ms - med: 1995.135 - 95pct: 3782.767 - 99pct: 3999.711 - 99.9pct: 4776.831 - 99.99pct: 4997.055 - Max: 4998.015

The current algorithm in the rate limiter seems to have it's limitations. It seems to work in a way where it switches the "auto read" to false for the Netty channel to cause backpressure when the rate limit is reached. However, it seems that the rate limit will reset once per second. This has the consequence that buffered content will get resumed.

@galrose
Copy link
Contributor Author

galrose commented Apr 26, 2021

Yes you are correct my bad, you can also do just the -m for message limitation and you can reproduce it like that as well.
I'm not sure if your PR covers this issue as well, but I'll check if it fixes it as soon as it is merged.

@galrose galrose closed this as completed Nov 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

No branches or pull requests

2 participants