Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Broker] Fix producer getting incorrectly removed from topic's produc… #12846

Merged

Conversation

michaeljmarshall
Copy link
Member

…ers map

Motivation

There is currently a bug that allows a producer to be removed from a topic's producers map, but not from the ServerCnx#producers map. As a consequence, two producers with the same name can connect at the same time.

The fundamental issue here is with the definition of equality for the Producer class. This class's definition of equality has been an issue before. @merlimat fixed one such issue here: #11804. That fix addressed some of the problems, but I think we need to go a step further in making sure that the wrong producer cannot be removed from the topic's producers map.

In the new test that I add with this PR, I show that it is still possible to remove the wrong producer. In the test, we essentially get two callbacks on the getOrCreateTopic future. Once that future succeeds, we create one producer and then the second fails and calls producer.closeNow(true), which removes the successful producer from the topic. It does not remove the successful topic from the ServerCnx because of the changes introduced in #9256.

The producer's equals method is only used when ensuring that the right producer is removed from the producers map and when determining if a producer is allowed to overwrite another producer. In my view, the simplest solution is to remove the overriding of equals and put the logic from equals into another method. This way removing a producer from a map is not too complicated (object equality is now reference equality), and we can still determine if a producer should override another.

Modifications

  • Remove Producer#equals and Producer#hashCode to rely on the default object implementations.
  • Add Producer#isSuccessorTo to retain the logic required to determine if a producer is okay to overwrite an existing producer.

Other Benefits

When a producer is not present in a topic's producers map, the producer does not report metrics.

Verifying this change

I added a new test that shows a way to reproduce the issue. The new test fails on master but succeeds on my branch. There are also many tests that verify this code path to ensure this change does not introduce a regression.

Does this pull request potentially affect one of the following parts:

If yes was chosen, please highlight the changes

  • Dependencies (does it add or upgrade a dependency): no
  • The public API: no
  • The schema: no
  • The default values of configurations: no
  • The wire protocol: no
  • The rest endpoints: no
  • The admin cli options: no
  • Anything that affects deployment: no

Documentation

  • no-need-doc

This is an internal bug fix.

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Nov 17, 2021
@michaeljmarshall
Copy link
Member Author

@ivankelly - in looking at #9256, I am wondering if I should also submit a PR similar to this one but for consumers. I haven't researched the consumer path for removal yet, so I cannot say for sure, but if there was a race for closing the wrong consumers, it's possible that the definition of equality is too broad for the Consumer class, and if so, the wrong consumer could be removed from the topic. Let me know, thanks.

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch

@michaeljmarshall
Copy link
Member Author

/pulsarbot run-failure-checks

Copy link
Contributor

@nicoloboschi nicoloboschi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@@ -147,22 +147,17 @@ private String parseRemoteClusterName(String producerName, boolean isRemote) {
return null;
}

@Override
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we leave hashCode() & equals() with explanation on why it needs to be identity equal? Otherwise, someone will add it back later on...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests I've added/updated should fail if the definition of equality changes or becomes too broad. If you are worried that the tests are insufficient, I'm happy to add the overrides back in. Let me know.

@merlimat merlimat added this to the 2.10.0 milestone Nov 17, 2021
@merlimat merlimat added type/bug The PR fixed a bug or issue reported a bug release/2.9.1 labels Nov 17, 2021
@eolivelli eolivelli merged commit e33687d into apache:master Nov 18, 2021
zeo1995 pushed a commit to zeo1995/pulsar that referenced this pull request Nov 18, 2021
* up/master:
  Update deploy-bare-metal.md (apache#12432)
  [Broker] Fix producer getting incorrectly removed from topic's producers map (apache#12846)
  Add error log when new jetty client (apache#12840)
  JavaInstanceTest should be AssertEquals (apache#12836)
  [Transaction] Fix transaction flaky test testMaxReadPositionForNormalPublish (apache#12681)
  The problem of two exception handling (apache#12744)
  Fix TopicPoliciesCacheNotInitException issue. (apache#12773)
  Added local filesystem backend for package manager (apache#12708)
  [Java Client] Make userProvidedProducerName final (apache#12849)
  optimize indention in ServerCnx#handleProducer (apache#12854)
zeo1995 pushed a commit to zeo1995/pulsar that referenced this pull request Nov 18, 2021
* up/master:
  [Issue 12757][broker] add broker config isAllowAutoUpdateSchema (apache#12786)
  Update deploy-bare-metal.md (apache#12432)
  [Broker] Fix producer getting incorrectly removed from topic's producers map (apache#12846)
  Add error log when new jetty client (apache#12840)
  JavaInstanceTest should be AssertEquals (apache#12836)
  [Transaction] Fix transaction flaky test testMaxReadPositionForNormalPublish (apache#12681)
  The problem of two exception handling (apache#12744)
  Fix TopicPoliciesCacheNotInitException issue. (apache#12773)
  Added local filesystem backend for package manager (apache#12708)
lhotari pushed a commit to datastax/pulsar that referenced this pull request Nov 19, 2021
nicoloboschi pushed a commit to datastax/pulsar that referenced this pull request Nov 22, 2021
nicoloboschi pushed a commit to datastax/pulsar that referenced this pull request Nov 22, 2021
nicoloboschi added a commit to datastax/pulsar that referenced this pull request Nov 22, 2021
dlg99 pushed a commit to dlg99/pulsar that referenced this pull request Nov 23, 2021
eolivelli pushed a commit to eolivelli/pulsar that referenced this pull request Nov 29, 2021
lhotari pushed a commit that referenced this pull request Dec 9, 2021
lhotari pushed a commit that referenced this pull request Dec 9, 2021
@lhotari lhotari added the cherry-picked/branch-2.8 Archived: 2.8 is end of life label Dec 9, 2021
@michaeljmarshall michaeljmarshall deleted the fix-removing-producers-from-topic branch December 9, 2021 14:33
codelipenghui pushed a commit that referenced this pull request Dec 12, 2021
@codelipenghui codelipenghui added the cherry-picked/branch-2.7 Archived: 2.7 is end of life label Dec 12, 2021
codelipenghui pushed a commit that referenced this pull request Dec 12, 2021
codelipenghui added a commit to codelipenghui/incubator-pulsar that referenced this pull request Dec 12, 2021
…ception.

When the producer with the same producer ID and same connection,
Of course this usually doesn't happen. When cherry-picking apache#12846,
the test using the same producer ID https://github.com/apache/pulsar/pull/12846/files#diff-b73845cd7706e03b89122a21b3e60c36c95ce6a0c7dd4574f8eda3fc67dd02b1R867-R876
And after we apply the change https://github.com/apache/pulsar/pull/8685/files#diff-1e0e8195fb5ec5a6d79acbc7d859c025a9b711f94e6ab37c94439e99b3202e84R1161
Only one request can get a response when encounter exceptions.

The fix is make sure the broker don't miss the response to the client,
we should make sure one request can get one response, otherwise the client
side might get blocked.
michaeljmarshall added a commit to michaeljmarshall/pulsar that referenced this pull request Dec 12, 2021
@eolivelli eolivelli added cherry-picked/branch-2.9 Archived: 2.9 is end of life release/2.9.1 and removed release/2.9.2 labels Dec 15, 2021
fxbing pushed a commit to fxbing/pulsar that referenced this pull request Dec 19, 2021
aloyszhang pushed a commit to aloyszhang/pulsar that referenced this pull request Aug 5, 2022
Squash merge branch 'fix_producer' into '2.8.1'
Fixes #<xyz>

### Motivation

--bug=97153739 producer重名问题修复 
chery pick from apache#12846

TAPD: --bug=097153739
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-picked/branch-2.7 Archived: 2.7 is end of life cherry-picked/branch-2.8 Archived: 2.8 is end of life cherry-picked/branch-2.9 Archived: 2.9 is end of life doc-not-needed Your PR changes do not impact docs release/2.7.4 release/2.8.2 release/2.9.1 type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants