New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ServerCnx] Close connection after receiving unexpected SendCommand #12780
[ServerCnx] Close connection after receiving unexpected SendCommand #12780
Conversation
I sent a note to the mailing list to discuss this change: https://lists.apache.org/thread/6y6jfdx432j2gqxgk9cnhdw48fq1m6b1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch!
I agree that this fix will also allow to deal with old Pulsas clients as we cannot fix released clients and there are already tons of existing applications in the wild
…pache#12780) (cherry picked from commit ba58095)
Hi @michaeljmarshall I think it makes sense to create an independent section under Producer chapter. Can you help add explanations for this feature? Thanks |
@michaeljmarshall double check: is this doc added for this PR? |
@Anonymitaet - that documentation is different. I'll submit a PR with documentation for this PR on Monday. |
Hi @michaeljmarshall any progress for docs?
|
Motivation
When a broker receives a message for a producer that is not registered, in the process of being registered, or failed registration, the message gets ignored. Because the Pulsar protocol allows a client to deliver multiple messages before receiving any acknowledgement, the behavior of ignoring messages presents a risk for persisting messages out of order. By itself, the broker will not persist out of order this way because the client is only supposed to send messages after registering the producer successfully on the connection. However, as we saw in #12779, a client that does not follow the protocol could persist messages out of order unless we update the behavior.
I propose closing the connection when "unexpected" messages are received.
One tradeoff for this implementation is that when the broker initiates closing a producer, there is a chance that the whole connection will get closed because the producer had messages in flight. I think this is a reasonable tradeoff to ensure that clients not following the protocol are not able to persist messages out-of-order.
From my perspective, this is the simplest solution that will ensure message order is preserved. Alternatively, we could come up with logic to try to handle messages sent to recently closed producers, but that would greatly increase the complexity for this edge case. Note that it is not sufficient to reply to each message with a
SendError
because the producer may have already sent later messages and those could be persisted if the producer is concurrently being created. Note also that when the Java Client producer receives a genericSendError
, it reacts by closing the connection in most cases.Modifications
ServerCnx
to close the connection when a broker receives a message for a producer that is not ready to handle that message.Verifying this change
I added a test to verify this new behavior.
Does this pull request potentially affect one of the following parts:
If
yes
was chosen, please highlight the changesThe wire protocol itself is not changed, but the way that the broker implements the protocol is changed. Since the broker can close the connection at any time, this change will not break any integrations.
Documentation
doc-required
I think we will want to update the documentation of the Pulsar protocol here https://pulsar.apache.org/docs/en/develop-binary-protocol/#producer. The current docs do not mention how a broker will handle such a case. I'll need a little help understanding how/where we want to update the documentation for this change.