Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ServerCnx] Close connection after receiving unexpected SendCommand #12780

Conversation

michaeljmarshall
Copy link
Member

Motivation

When a broker receives a message for a producer that is not registered, in the process of being registered, or failed registration, the message gets ignored. Because the Pulsar protocol allows a client to deliver multiple messages before receiving any acknowledgement, the behavior of ignoring messages presents a risk for persisting messages out of order. By itself, the broker will not persist out of order this way because the client is only supposed to send messages after registering the producer successfully on the connection. However, as we saw in #12779, a client that does not follow the protocol could persist messages out of order unless we update the behavior.

I propose closing the connection when "unexpected" messages are received.

One tradeoff for this implementation is that when the broker initiates closing a producer, there is a chance that the whole connection will get closed because the producer had messages in flight. I think this is a reasonable tradeoff to ensure that clients not following the protocol are not able to persist messages out-of-order.

From my perspective, this is the simplest solution that will ensure message order is preserved. Alternatively, we could come up with logic to try to handle messages sent to recently closed producers, but that would greatly increase the complexity for this edge case. Note that it is not sufficient to reply to each message with a SendError because the producer may have already sent later messages and those could be persisted if the producer is concurrently being created. Note also that when the Java Client producer receives a generic SendError, it reacts by closing the connection in most cases.

Modifications

  • Update ServerCnx to close the connection when a broker receives a message for a producer that is not ready to handle that message.

Verifying this change

I added a test to verify this new behavior.

Does this pull request potentially affect one of the following parts:

If yes was chosen, please highlight the changes

  • Dependencies (does it add or upgrade a dependency): no
  • The public API: no
  • The schema: no
  • The default values of configurations: no
  • The wire protocol: yes and no
  • The rest endpoints: no
  • The admin cli options: no
  • Anything that affects deployment: no

The wire protocol itself is not changed, but the way that the broker implements the protocol is changed. Since the broker can close the connection at any time, this change will not break any integrations.

Documentation

  • doc-required

I think we will want to update the documentation of the Pulsar protocol here https://pulsar.apache.org/docs/en/develop-binary-protocol/#producer. The current docs do not mention how a broker will handle such a case. I'll need a little help understanding how/where we want to update the documentation for this change.

@github-actions github-actions bot added the doc-required Your PR changes impact docs and you will update later. label Nov 12, 2021
@michaeljmarshall
Copy link
Member Author

I sent a note to the mailing list to discuss this change: https://lists.apache.org/thread/6y6jfdx432j2gqxgk9cnhdw48fq1m6b1.

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch!

I agree that this fix will also allow to deal with old Pulsas clients as we cannot fix released clients and there are already tons of existing applications in the wild

@aahmed-se aahmed-se merged commit ba58095 into apache:master Nov 13, 2021
eolivelli pushed a commit to datastax/pulsar that referenced this pull request Nov 15, 2021
eolivelli pushed a commit that referenced this pull request Nov 15, 2021
@Anonymitaet
Copy link
Member

Hi @michaeljmarshall I think it makes sense to create an independent section under Producer chapter. Can you help add explanations for this feature? Thanks

@Anonymitaet
Copy link
Member

@michaeljmarshall double check: is this doc added for this PR?

@michaeljmarshall michaeljmarshall deleted the close-connection-for-unexpected-message branch November 26, 2021 01:51
@michaeljmarshall
Copy link
Member Author

@michaeljmarshall double check: is this doc added for this PR?

@Anonymitaet - that documentation is different. I'll submit a PR with documentation for this PR on Monday.

@Anonymitaet
Copy link
Member

Hi @michaeljmarshall any progress for docs?

@michaeljmarshall double check: is this doc added for this PR?

@Anonymitaet - that documentation is different. I'll submit a PR with documentation for this PR on Monday.

@lhotari lhotari added this to the 2.10.0 milestone Dec 9, 2021
lhotari pushed a commit that referenced this pull request Dec 9, 2021
@lhotari lhotari added cherry-picked/branch-2.8 Archived: 2.8 is end of life cherry-picked/branch-2.9 Archived: 2.9 is end of life and removed release/2.9.1 labels Dec 9, 2021
@Anonymitaet Anonymitaet added doc-complete Your PR changes impact docs and the related docs have been already added. and removed doc-required Your PR changes impact docs and you will update later. labels Feb 9, 2022
@codelipenghui codelipenghui modified the milestones: 2.10.0, 2.9.0 Apr 19, 2022
@codelipenghui codelipenghui removed release/2.9.0 cherry-picked/branch-2.9 Archived: 2.9 is end of life labels Apr 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-picked/branch-2.8 Archived: 2.8 is end of life doc-complete Your PR changes impact docs and the related docs have been already added. release/2.8.2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants