New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Occasional consumer stucked when restart consumer whit key_shared subscription type . #10284
Comments
We see this problem roughly 50% of the time when we do a rolling restart of our consumers, e.g., 5 at a time for total of 20. Normally the only way to fix it is to unload the bundle housing the topic. We've been experiencing this on all versions since 2.6.3 and have seen several "stuck consumer" type issues be marked as resolved but still issues with key_shared remain. Is there anything we can do when we experience this issue to assist in getting it fixed? |
@james-bright-helix You can try out 2.8.1 or 2.7.3 which contains #10920 |
@codelipenghui sorry my message wasn't clear. We're on 2.8.1 and tried every version since 2.6.3 but still suffering from stuck key_shared consumers. I was hoping the attached PR was going to fix our issue but it seems to not be progressing hence the offer to help provide more details. |
@james-bright-helix Do you have a way to reproduce the issue on 2.8.1? |
@codelipenghui not consistently in a way that's not disruptive. We have to bounce our production app and then it happens frequently. we see it very rarely in our non-production envs which are much smaller. Are there any additional logging/metrics we can gather to share when it does happen? |
@codelipenghui I thought I should mention it's possibly related to #12208 as we are using reconsumeLaterAsync() on all topics (if we have an environmental error we want to retry after a delay) and on one topic we also use deliverAfter() although we don't usually see anything on the retry topics when it's stuck. As mentioned in that issue, it's not clear if these are expected to work with key_shared subscriptions. |
In such scene , consumer will be stucked after restart.
First step , tow consumers with key_shared subscription type and same group.
such as consumer1 and consumer2
Second step, broker receive consumer1 flow command with 1000 permits and do not get consumer2's flow command.
Third step, broker start send message to consumer, but messages whit keys are assigned to consumer2, so it will not send any message to consumers;
Fourth step , Next loop send time, getRestrictedMaxEntriesForConsumer will aways return 0, and will not send any messages.
The text was updated successfully, but these errors were encountered: