-
Notifications
You must be signed in to change notification settings - Fork 3.6k
PIP 37: [pulsar-client] support large message size #4400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@rdhabalia There are some PRs on going for PIP-36, will it partly solve this issue? |
No, right now, pulsar can only support up to 5MB message size and PIP-36 doesn't support large message but it allows broker to configure max message size from 0 to 5MB. we need it for customers who want to publish large message into large-scale specialized pulsar instance for which txn might not be feasible option. it's similar like chunking in kafka mentioned here. |
Hi @rdhabalia, sorry for the delay. After PIP36 (#4247 ), pulsar could support message size large than 5MB. |
@jiazhai sorry for late response as I had missed your comment.
That's incorrect. #4247 is not a useful feature and that could be used for a specific usecase but we can not use for large scale and multi tenant system where same bookie is serving topics from different tenants. |
* recommended configuration at pulsar producer and consumer. Recommendation to use this feature: | ||
* | ||
* <pre> | ||
* 1. This feature is right now only supported by non-shared subscription and persistent-topic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes me feel that we are not on the right track for implementing this feature. If a user is required to do these many things in order to enable this feature, it seems that we are not doing something correctly. Also, this seems to make the producer and consumer logic very complicated.
Can this chunking feature be implemented in a high level? E.g. over a producer wrapper or a consumer wrapper? E.g. you can chunk a large message into smaller messages and add the chunk id in the message properties.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes harder to keep this responsibility in wrapper because it also requires to stitch messages by handling all failure scenarios. So, it's more logical to keep this responsibility with in client which makes it more simpler to handle them. Other system also does similar thing at client side.
Also, many customers do such chunking and stitching at application level for their pipeline to transport large messages without impacting cluster and other tenants. But they also have to take care message-sizing, chunking, dedup and failure scenario which make harder to solve this problem. Therefore, it makes user's life easier if this feature is part of the client lib.
Right now, all the users need this feature for the streaming usecase which needs exclusive/failure subscription to consume data and push it to grid or perform Online analytical processing.. so, we have introduced this feature for one type of subscription for now and will see in future if we need it for other type as well.
If a user is required to do these many things in order to enable this feature,
User can use this feature with default values as well but here, we have added documentation to utilize this feature with better performance by tuning configuration. So, there shouldn't be complexity to support this feature and it should be straight forward to use it. In fact, this feature is more useful when pulsar is deployed on large scale system and user can deal with large size of messages without impacting other tenants.
we have many customers need this feature so, can someone please review the PR and can we merge the PR soon. |
@jiazhai @codelipenghui can you help review @rdhabalia 's change? |
|
@rdhabalia Would you please resolve the conflicts? So that we can onboard this PR in 2.6.0 release. |
c52ab25
to
595467a
Compare
@@ -96,6 +96,9 @@ private ProducerBuilderImpl(PulsarClientImpl client, ProducerConfigurationData c | |||
|
|||
@Override | |||
public CompletableFuture<Producer<T>> createAsync() { | |||
// config validation | |||
checkArgument(!(conf.isBatchingEnabled() && conf.isChunkingEnabled()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rdhabalia The change looks good to me. About to enable the chunk, is it possible to add a config for the max chunk size and enable batching at the same time. If the message size is greater than the max chunk size and the chunk is enabled, we send messages with chunks, If the message is size is smaller than the max chunk size, we also can add them to a batch and if chunk is enabled, we make sure the match size can not exceed the max chunk size. So that it can be better applied when the user cannot confirm the message size for example from 100 bytes to 10MB. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@codelipenghui we can do but it will be little tricky to cover all usecases as we use batch-container if batching is enabled and combination of non-batch and batching will be little tricky. I have already put good amount of efforts to rebase and doing perf testing with changes. so, let's merge this PR and I will create a issue to add this feature and address it in separate PR as it will also require some effort to cover all unknowns.
So, can you please review this PR again as I am done with rebasing.
I just rebased all the files and I can again see conflicts.. :-) please review so, I will rebase it again and merge it soon. |
fix producer fix ref counts add timeouts add validation fix recycling fix stats review fix test fix test fix send message and expiry-consumer-config fix schema test fix chunk properties
* PIP 37: [pulsar-client] support large message size fix producer fix ref counts add timeouts add validation fix recycling fix stats review fix test fix test fix send message and expiry-consumer-config fix schema test fix chunk properties * fix test
* PIP 37: [pulsar-client] support large message size fix producer fix ref counts add timeouts add validation fix recycling fix stats review fix test fix test fix send message and expiry-consumer-config fix schema test fix chunk properties * fix test
### Motivation The PR apache#4400 introduced new params on the pulsar-client consume command but it broked some old ones. ### Modifications This PR fixes this problem to restore the good param behavior.
### Motivation The PR apache#4400 introduced new params on the pulsar-client consume command but it broked some old ones. ### Modifications This PR fixes this problem to restore the good param behavior.
### Motivation The PR apache#4400 introduced new params on the pulsar-client consume command but it broked some old ones. ### Modifications This PR fixes this problem to restore the good param behavior.
### Motivation The PR apache#4400 introduced new params on the pulsar-client consume command but it broked some old ones. ### Modifications This PR fixes this problem to restore the good param behavior.
Motivation
Addressing PIP-37 to support large size of messages in pulsar.
Modification
This PR supports large message size for the non-shared subscription and it has only client side changes.
I will create a separate PR to support large size message for shared-subscription.