Delivering disordered messages during shutdown(rebalance), causing message loss #255
Comments
@longquanzheng I am currently working on implementing cluster functionality into sarama directly - IBM/sarama#1099. Only a few small bits are missing at this point and I aim to finish those next week. |
@longquanzheng I am nevertheless happy to fix the race if you can identify how it happens. We have quite a "fuzzy" test in our suite (see https://github.com/bsm/sarama-cluster/blob/master/consumer_test.go#L284) which seems to pass without losing any messages. |
@dim I found that sarama-cluster would deliver messages out of order starting from "Close()" is called to shutdown the library. Here is an example: worker A owns the partition, consuming 1,2,3,...,9,10, everything is good, and then we call Close() to shutdown, it starts to receive lots of misordered messages, for example it got 100 (jump over 11 ~~ 99). To mitigate this issue, I enforce our consuming process to sleep for 2 seconds before shutdown sarama-cluster, and it works.(uber/cadence@ec218d7 ) But this is not a final solution. We want to understand why sarama-cluster starts to deliver out of order when we call(Close) Note that this issue only repro in our production when hosts are busy, where we run lots of processes concurrently. We are not able to reproduce it in laptop or some idle hardware. |
@longquanzheng from what I know, messages are only ordered per topic/partition. also, |
@dim we finally found that the bug is here: georgeteo/sarama@9618a79 |
Actually, the issue is with the sarama/sarama-cluster contract. Specifically, in https://github.com/bsm/sarama-cluster/blob/master/partitions.go#L89, you use We use the The proposed fix would be to use |
This is the PR with the fix for this: #258. |
Thanks for accepting the PR. Can you tag a new release as well? |
@georgeteo looks like you didn't run the tests 😄 |
I've tried #258 fix today and reverted back. |
By the way #258 worked for me with |
@danpmx, @jpiper, I'm unable to reproduce the crash. Can you post your consumer configuration? When running the following non partition consumer code, I don't see either crashing or infinite rebalance:
Two consumer workers:Worker 1
Worker 2
@dim: do you have any clues why non partition consumer might be broken with my recent change? |
@georgeteo I’m using this config kafkaConfig := cluster.NewConfig()
sarama.MaxResponseSize = 104857600
sarama.MaxRequestSize = 104857600
kafkaConfig.Version = sarama.V1_1_0_0 // Sarama will default to 0.8
kafkaConfig.Group.PartitionStrategy = cluster.StrategyRoundRobin
kafkaConfig.Consumer.Return.Errors = true
kafkaConfig.Group.Return.Notifications = true
kafkaConfig.ChannelBufferSize = 1000 |
I am able to repro behavior identical to what @danpmx reported. I digged into this and the root cause appears to be a deadlock in the underlying sarama library (which existed even before the fix added by george). But the new fix caused this deadlock to manifest itself differently i.e. after the fix, its a bunch of rebalance errors; before the fix, the deadlock will lead to consumer not receiving any messages at all. Following is the potential bug I discovered:
So, I see two issues now:
Whoever Owns Sarama: Verify if my analysis above is valid and update this ticket. |
@georgeteo our configuration:
|
@venkat1109 The rebalance -> |
@imjustfly - sarama.PartitionConsumer.Close() will block because of this for range loop (and not because of drain, which runs in its own goroutine): i.e. sarama.PartitionConsumer.Close() will return only after child.errors channel is closed. |
@venkat1109
|
@imjustfly please see the sequence of steps I described above.
|
- Shutdown all partitions before shutting down the sarama consumer. This sidesteps bsm/sarama-cluster#255 and ensures that the shutdown completes in a reasonable timeframe. - Wait for PartitionConsumer shutdown before consuming messages - Use Sarama's PartitionConsumer mock instead of relying on our own because it is richer and well tested.
- Shutdown all partitions before shutting down the sarama consumer. This sidesteps bsm/sarama-cluster#255 and ensures that the shutdown completes in a reasonable timeframe. - Wait for PartitionConsumer shutdown before consuming messages - Use Sarama's PartitionConsumer mock instead of relying on our own because it is richer and well tested.
We are experiencing msg loss during restarting workers, it happens consistently.
But when I added these logs, the message loss disappear.
longquanzheng@0c61134
I think this is because the logging slow down CommitOffsets() function. There must be some race condition with it.
Do you have any idea?
The text was updated successfully, but these errors were encountered: