Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: bump default version to 1.0.0 #1791

Merged
merged 7 commits into from Sep 24, 2020

Conversation

stanislavkozlovski
Copy link
Contributor

@stanislavkozlovski stanislavkozlovski commented Aug 26, 2020

This PR fixes #1787. As explained there:

The current default, 0.8.2, is a 5 year-old version. The problem with defaulting to older versions when the Kafka cluster is a newer version is manifold:

  • requires Kafka to downconvert consumer response messages (e.g if a newly-configured client produced messages and a default Sarama client is reading) - this has implications on the performance of Kafka, because it uses additional CPU. Especially so if compression is enabled, at which point the broker would need to decompress, downconvert and compress the message back. Downconversion also blocks zerocopy, because your fetch response now needs to be copied into memory
  • requires Kafka to upconvert producer requests' messages
  • in general is hard to support. While the latest Kafka currently supports the oldest versions, it is in the best interest of the project to deprecate and eventually drop support for legacy versions. Otherwise it becomes hard to maintain, the test matrix grows and new features need to work around old version limitations (no idempotency, exactly once). It is easier for Kafka to deprecate/drop support when its ecosystem has done so already

0.11 is the minimum we should default at as it enables Kafka's v2 message format, avoiding expensive upconversion/downconversion in the Kafka broker. It is also required for correctness in some cases (e.g KIP-101)
1.0.0 is a 3-year old version and I think a good one to default to at this point.

@ghost ghost added the cla-needed label Aug 26, 2020
@stanislavkozlovski
Copy link
Contributor Author

cc @d1egoaz @ijuma @varun06 @twmb

@stanislavkozlovski stanislavkozlovski changed the title fix: bump default version to 0.11.0.2 fix: bump default version to 1.0.0 Aug 26, 2020
The current default, 0.8.2, is a 5 year-old version. The problem with defaulting to older versions when the Kafka cluster is a newer version is manifold:

- requires Kafka to downconvert consumer response messages (e.g if a newly-configured client produced messages and a default Sarama client is reading) - this has implications on the performance of Kafka, because it uses additional CPU. Especially so if compression is enabled, at which point the broker would need to decompress, downconvert and compress the message back. Downconversion also blocks zerocopy, because your fetch response now needs to be copied into memory
- requires Kafka to upconvert producer requests' messages
- in general is hard to support. While the latest Kafka currently supports the oldest versions, it is in the best interest of the project to deprecate and eventually drop support for legacy versions. Otherwise it becomes hard to maintain, the test matrix grows and new features need to work around old version limitations (no idempotency, exactly once). It is easier for Kafka to deprecate/drop support when its ecosystem has done so already

0.11 is the minimum we should default at as it enables Kafka's v2 message format, avoiding expensive upconversion/downconversion in the Kafka broker. It is also required for correctness in some cases (e.g KIP-101)
1.0.0 is a 3-year old version at this point and a reasonable default
@varun06
Copy link
Contributor

varun06 commented Aug 26, 2020

##[error]    async_producer_test.go:170: kafka: client has run out of available brokers to talk to (Is your cluster reachable?)
2205
--- FAIL: TestAsyncProducerMultipleFlushes (90.77s)

hmm, that doesn't look related

@edenhill
Copy link

0.11 is the minimum we should default

It should be sufficient to use 0.10 as default and then use ApiVersionRequest to select the highest supported version for each protocol request.

@dnwe
Copy link
Collaborator

dnwe commented Aug 27, 2020

@edenhill yes the plan is to migrate to ApiVersionRequest in a v2.x release, but I think this is a reasonable default behaviour to leave the 1.x series at so that any user that doesn’t override the version won’t be a hindrance to clusters

@ijuma
Copy link

ijuma commented Aug 27, 2020

@dnwe +1

@dnwe
Copy link
Collaborator

dnwe commented Aug 27, 2020

##[error]    async_producer_test.go:170: kafka: client has run out of available brokers to talk to (Is your cluster reachable?)
2205
--- FAIL: TestAsyncProducerMultipleFlushes (90.77s)

hmm, that doesn't look related

It is related, because the mock broker is sending a V0 MetadataResponse when the client (at V1_0_0_0) has sent a V5 MetadataRequest

@stanislavkozlovski
Copy link
Contributor Author

I'm having a bit of a hard time to fix and pinpoint the issue. I took @dnwe's advice and fixed the test to use V5 MetadataResponses but it still results in a 10-minute timeout with a bunch of threads stuck, apparently when closing the producer and draining the inflight requests.

As a side note - we could benefit from having a single place to define the request version (and we could use this for resp in tests too) - I see we set the metadata request version according to the conf.Version in multiple places in the sender's code. I don't want to increase the scope of this PR but thought I'd note the potential improvement in case we want to capture it in an github issue

@dnwe
Copy link
Collaborator

dnwe commented Aug 27, 2020

@stanislavkozlovski it'll be making V3 ProducerRequests too, so I think you'll need V3 ProducerResponses

@dnwe
Copy link
Collaborator

dnwe commented Aug 27, 2020

If you add Logger = log.New(os.Stderr, t.Name(), log.LstdFlags) at the start of the failing test(s) you can see what the mockbroker is doing.

However, as a simpler measure you might want to just update the tests to explicitly specify a Version = MinVersion in their config to just make them continue to exercise the behaviour as they used to. We already have (some) tests that exercise at specific versions.

@stanislavkozlovski
Copy link
Contributor Author

Thanks @dnwe, I like the approach of switching back the Kafka version as ensuring each request and response are the same version seems like a large task, probably better to be deferred in a separate issue.

The latest commit adds two test functions NewTestConfig in both the mocks package and sarama. Let me know if I should open a GH Issue and mention its number in that function. It'd be good to eventually resolve this as it'll be a gotcha for developers to use NewConfig() in tests and see it fail weirdly

@stanislavkozlovski
Copy link
Contributor Author

cc @varun06 @dnwe I've got the build passing now. Unfortunately I had to sprinkle a special test config in most tests in order to not default to the new version

@varun06
Copy link
Contributor

varun06 commented Sep 5, 2020

@stanislavkozlovski Thanks for getting this through. Use of NewTestConfig seems fine, I wonder if we should add some more documentation around it 🤔

Copy link
Contributor

@varun06 varun06 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

@stanislavkozlovski
Copy link
Contributor Author

Thanks for the review @varun06. I had added documentation around the method:

// NewTestConfig returns a config meant to be used by tests.
// Due to inconsistencies with the request versions the clients send using the default Kafka version
// and the response versions our mocks use, we default to the minimum Kafka version in most tests
func NewTestConfig()

Let me know if you can think of more to say, or whether some other place is best to document it in

@stanislavkozlovski
Copy link
Contributor Author

Hey there @varun06 @dnwe @bai - what should our next steps be in order to merge this PR?

@varun06
Copy link
Contributor

varun06 commented Sep 24, 2020

Need one more approval and then we can merge it.

@bai bai requested a review from dnwe September 24, 2020 12:21
Copy link
Collaborator

@dnwe dnwe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — sorry for the delay

@dnwe dnwe merged commit a904204 into IBM:master Sep 24, 2020
@d1egoaz d1egoaz mentioned this pull request Oct 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consider switching default Kafka version
5 participants