New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible Message Batching Bug? #2797
Comments
@road-cycling thanks for reporting this, would it be possible for you to upgrade to a newer version of Sarama? At least v1.41.1, but ideally picking up the latest v1.42.2. In particular #2628 disambiguated between messages failing a local client-side Producer.MaxMessageBytes check, versus those being genuinely rejected by the remote cluster so that would help to narrow down where you're seeing this occur |
Using Logs
Same Switch Case
Note:
|
I was looking at this today. We were also playing with producer vs broker compression and testing sending batches of data, some all zeroes, some random (which won't compress). We quickly found that we (like @road-cycling above) had to set:
(presumably following the same advice we found in #2142 (comment)) And while that happily fixed our ability to send large compressible messages (e.g. 2MB of all zeroes), it then broke our ability to send other messages, such as 400KB of random data, or 400KB of zeroes to a topic with no producer compressions. e.g. what we'd see if we tried to send 10 x 400KB messages in quick succession, we would see the first message send in a batch on its own and succeed, however the following 9 messages would batch together (making a total size of approx 3.2MB) which would then result in the error: Output from our test app:
And this is because an unintended side-effect of max-ing out Lines 253 to 255 in 0ab2bb7
which as I understand it controls the batching logic. By setting that to If the messages are sent by themselves (low-traffic) then it works, but when batched up, fails miserably. So I do think we need to keep Lines 453 to 457 in 0ab2bb7
That would also help clear up #2851 and all the associated issues there, as we'd likely opt to ignore that check. |
Description
I'm noticing Kafka Broker batched message submissions are getting rejected due to the message being too large. Initially I thought this was due to the post compressed message size being too large however, logs added to the Kafka.Errors feedback channel (as shown below) displayed messages as small as 900B being rejected.
Logs
Logging Code
Looking closer, these errors were bursty and occurs all at once. I know that one kafka client will take batching errors and resubmit them after breaking them down but looking at the code in this repo, this is not the case. I was looking through the code + docs and saw that most of the configs are stated as 'best-effort' with the docs stating 'By default, messages are sent as fast as possible, and all messages received while the current batch is in-flight are placed into the subsequent batch.' Is it possible here that the buffers are filling too quickly past the limit when an existing publish is in flight, and then is over the size?
Versions
Note: I'm unable to reproduce this on my laptop. I can only reproduce this in a production environemnt where the throughput specs are 350GB/min, 13M messages consumed/min, 13M messages published/min. Adding more hosts to the cluster does alleviate the issue even though CPU/Memory are very low
Configuration
Logs
Additional Context
The text was updated successfully, but these errors were encountered: