New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove log unacked msg. #14246
Remove log unacked msg. #14246
Conversation
### Motivation As #13383 fixed the batch ack issue. we find that the unack-msg count could be negative(#14246). At first, we think it was the normal case caused by msg redelivery. But after diving into the logic, we find it's a bug. The test is copy from #14246 : ``` for (int i = 0; i < 50; i++) { Message<String> msg = consumer.receive(); if (i % 2 == 0) { consumer.acknowledgeAsync(msg); } else { consumer.negativeAcknowledge(msg); } } ``` When msg is `negativeAcknowledge`, Consumer#redeliverUnacknowledgedMessages will invoke: https://github.com/apache/pulsar/blob/b22f70658927e07e3726d32290065f47313070b9/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/Consumer.java#L900-L912 When calculating `totalRedeliveryMessages`, it must check `pendingAcks` contains this message. and remove from `pendingAcks` after that. (Dispatch messages will add messages to pendingAcks) So the above test may exist that when `negativeAcknowledge` first and then `acknowledgeAsync`. `acknowledgeAsync` mapped to `Consumer#individualAckNormal` and decrease unack-msg in : https://github.com/apache/pulsar/blob/b22f70658927e07e3726d32290065f47313070b9/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/Consumer.java#L543-L561 It doesn't check `pendingAcks`. this is the root cause. Should move line 556 to 545.
(cherry picked from commit 78bfaa2)
@Technoboy- following up here, now that we merged #14288, is this still required? Given that the log line helped find a bug, I am concerned that removing the log line might hide other bugs. |
As #13383 fixed the batch ack issue. we find that the unack-msg count could be negative(#14246). At first, we think it was the normal case caused by msg redelivery. But after diving into the logic, we find it's a bug. The test is copy from #14246 : ``` for (int i = 0; i < 50; i++) { Message<String> msg = consumer.receive(); if (i % 2 == 0) { consumer.acknowledgeAsync(msg); } else { consumer.negativeAcknowledge(msg); } } ``` When msg is `negativeAcknowledge`, Consumer#redeliverUnacknowledgedMessages will invoke: https://github.com/apache/pulsar/blob/b22f70658927e07e3726d32290065f47313070b9/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/Consumer.java#L900-L912 When calculating `totalRedeliveryMessages`, it must check `pendingAcks` contains this message. and remove from `pendingAcks` after that. (Dispatch messages will add messages to pendingAcks) So the above test may exist that when `negativeAcknowledge` first and then `acknowledgeAsync`. `acknowledgeAsync` mapped to `Consumer#individualAckNormal` and decrease unack-msg in : https://github.com/apache/pulsar/blob/b22f70658927e07e3726d32290065f47313070b9/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/Consumer.java#L543-L561 It doesn't check `pendingAcks`. this is the root cause. Should move line 556 to 545. (cherry picked from commit 6b828b4)
(cherry picked from commit 78bfaa2)
### Motivation As #13383 fixed the batch ack issue. we find that the unack-msg count could be negative(#14246). At first, we think it was the normal case caused by msg redelivery. But after diving into the logic, we find it's a bug. The test is copy from #14246 : ``` for (int i = 0; i < 50; i++) { Message<String> msg = consumer.receive(); if (i % 2 == 0) { consumer.acknowledgeAsync(msg); } else { consumer.negativeAcknowledge(msg); } } ``` When msg is `negativeAcknowledge`, Consumer#redeliverUnacknowledgedMessages will invoke: https://github.com/apache/pulsar/blob/b22f70658927e07e3726d32290065f47313070b9/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/Consumer.java#L900-L912 When calculating `totalRedeliveryMessages`, it must check `pendingAcks` contains this message. and remove from `pendingAcks` after that. (Dispatch messages will add messages to pendingAcks) So the above test may exist that when `negativeAcknowledge` first and then `acknowledgeAsync`. `acknowledgeAsync` mapped to `Consumer#individualAckNormal` and decrease unack-msg in : https://github.com/apache/pulsar/blob/b22f70658927e07e3726d32290065f47313070b9/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/Consumer.java#L543-L561 It doesn't check `pendingAcks`. this is the root cause. Should move line 556 to 545. (cherry picked from commit 6b828b4)
Yes, good idea. Thanks @michaeljmarshall . I have pushed a new patch #14501 |
### Motivation As apache#13383 fixed the batch ack issue. we find that the unack-msg count could be negative(apache#14246). At first, we think it was the normal case caused by msg redelivery. But after diving into the logic, we find it's a bug. The test is copy from apache#14246 : ``` for (int i = 0; i < 50; i++) { Message<String> msg = consumer.receive(); if (i % 2 == 0) { consumer.acknowledgeAsync(msg); } else { consumer.negativeAcknowledge(msg); } } ``` When msg is `negativeAcknowledge`, Consumer#redeliverUnacknowledgedMessages will invoke: https://github.com/apache/pulsar/blob/b22f70658927e07e3726d32290065f47313070b9/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/Consumer.java#L900-L912 When calculating `totalRedeliveryMessages`, it must check `pendingAcks` contains this message. and remove from `pendingAcks` after that. (Dispatch messages will add messages to pendingAcks) So the above test may exist that when `negativeAcknowledge` first and then `acknowledgeAsync`. `acknowledgeAsync` mapped to `Consumer#individualAckNormal` and decrease unack-msg in : https://github.com/apache/pulsar/blob/b22f70658927e07e3726d32290065f47313070b9/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/Consumer.java#L543-L561 It doesn't check `pendingAcks`. this is the root cause. Should move line 556 to 545.
Motivation
For #13383 fix batch message ack issue, but recently users find that there may exist the unacked-count be negative which will print the log :
pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/Consumer.java
Lines 932 to 934 in ca64c67
Then we use the below test to reproduce the case :
Because when redeliver occurs, ack bitset does not bring back to the broker, we can't calculate how many ack-msg we received. So in this case, the unack-msg count may be negative. But nothing impact.
To reduce user confusion, suggest deleting the log now.
Later we may find a good way to resolve the redelivering case.
Documentation
no-need-doc