Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure the client can get response when encounter producer busy exception. #13245

Conversation

codelipenghui
Copy link
Contributor

When the producer with the same producer ID and same connection,
Of course, this usually doesn't happen. When cherry-picking #12846,
the test using the same producer ID https://github.com/apache/pulsar/pull/12846/files#diff-b73845cd7706e03b89122a21b3e60c36c95ce6a0c7dd4574f8eda3fc67dd02b1R867-R876
And after we apply the change https://github.com/apache/pulsar/pull/8685/files#diff-1e0e8195fb5ec5a6d79acbc7d859c025a9b711f94e6ab37c94439e99b3202e84R1161
Only one request can get a response when encountering exceptions.

The fix is to make sure the broker doesn't miss the response to the client,
we should make sure one request can get one response, otherwise the client
side might get blocked.

Documentation

Check the box below and label this PR (if you have committer privilege).

Need to update docs?

  • doc-required

    (If you need help on updating docs, create a doc issue)

  • no-need-doc

    (Please explain why)

  • doc

    (If this PR contains doc changes)

…ception.

When the producer with the same producer ID and same connection,
Of course this usually doesn't happen. When cherry-picking apache#12846,
the test using the same producer ID https://github.com/apache/pulsar/pull/12846/files#diff-b73845cd7706e03b89122a21b3e60c36c95ce6a0c7dd4574f8eda3fc67dd02b1R867-R876
And after we apply the change https://github.com/apache/pulsar/pull/8685/files#diff-1e0e8195fb5ec5a6d79acbc7d859c025a9b711f94e6ab37c94439e99b3202e84R1161
Only one request can get a response when encounter exceptions.

The fix is make sure the broker don't miss the response to the client,
we should make sure one request can get one response, otherwise the client
side might get blocked.
@codelipenghui
Copy link
Contributor Author

/pulsarbot run-failure-checks

Copy link
Member

@michaeljmarshall michaeljmarshall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we should keep the current behavior. The only time that the producerFuture is completed early is when the client sends a CloseProducer command, and in that case, the client is explicitly telling the broker to stop tracking the prior request.

Note that one primary failure here is caused by client timeouts. The client tries to create the same producer multiple times (and closes it each time the producer creation times out), and then if the broker finally loads the topic, the futures will execute from the stack. The first one succeeds, and the rest fail because they all have the same name as the first producer.

I will be interested to see what other community members think here. Please feel free to dismiss my review if this PR is accepted.

Edit: a previous version of this comment assumed that this PR was solely for branch-2.7. After talking with @codelipenghui on slack, I realize my initial interpretation of the PR was wrong.

@codelipenghui
Copy link
Contributor Author

@merlimat @315157973 @hangc0276 Please help check this one.

eolivelli
eolivelli previously approved these changes Dec 28, 2021
Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me.

@michaeljmarshall ?

Copy link
Contributor

@315157973 315157973 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@michaeljmarshall
Copy link
Member

michaeljmarshall commented Jan 3, 2022

Makes sense to me.

@michaeljmarshall ?

@eolivelli - did you see my above comment #13245 (review)? I am opposed to this change because it makes the broker send a response to a client request that is known to be timed out. The broker knows this because the client completed the producerFuture by sending a CloseProducer command before the producer was created. The net effect of this change is that the client will log a meaningless error message for every timed out producer.

In my opinion, the core contract in this ServerCnx file is that the code that completes the producerFuture must also determine whether to send a response to the client. This change removes that contract and technically introduces a chance for the broker to respond to the same client request multiple times.

@eolivelli eolivelli dismissed their stale review January 3, 2022 21:31

Changed my mind

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michaeljmarshall I missed your comment sorry

@wolfstudy
Copy link
Member

wolfstudy commented Jan 10, 2022

Thanks @codelipenghui
This problem is very much like a problem encountered by the Go SDK community recently, and all the phenomena seem to match the description in this pr.

Through tcpdump, it was found that Go SDK sent ProducerCommand to the broker every time it reconnected, but the broker did not reply any Response to the client side, which caused the Go SDK to fail to reconnect until the entire client service was restarted.

The broker version: branch-2.7.2
go SDK version: new master

And the issue refer to: apache/pulsar-client-go#697

@michaeljmarshall
Copy link
Member

@wolfstudy - note that a 2.7.2 broker has the behavior that @codelipenghui is proposing here.

@codelipenghui codelipenghui modified the milestones: 2.10.0, 2.11.0 Jan 21, 2022
@codelipenghui codelipenghui deleted the penghui/fix-loss-create-producer-response branch May 17, 2022 01:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc-not-needed Your PR changes do not impact docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants