Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BalanceStrategyRange does not behave like Java / librdkafka #2243

Closed
njhartwell opened this issue Jun 6, 2022 · 3 comments
Closed

BalanceStrategyRange does not behave like Java / librdkafka #2243

njhartwell opened this issue Jun 6, 2022 · 3 comments

Comments

@njhartwell
Copy link
Contributor

Versions

Please specify real version numbers or git SHAs, not just "Latest" since that changes fairly regularly.

Sarama Kafka Go
0c4ba61 7.0.1 1.17.2
Problem Description

From the name / description, it looks like sarama's Range balance strategy seeks to mimic the Java and librdkafka behavior.

The Java / librdkafka behavior is desirable because it allows a consumer to subscribe to multiple topics that have "co-partitioned" data (i.e. each topic has the same partition count, key format, and partitioning function) and be guaranteed to receive the same subset of keys across all subscribed topics.

BalanceStrategyRange (by design?) does not exhibit the same behavior; there is even a test case indicating as much. It looks like it's using a hash of the topic and consumer id to determine the order to dole out partitions, which leads to seemingly non-deterministic behavior, since consumer IDs are usually random. Here's an additional test case for TestBalanceStrategyRange that shows how changing consumer group names can, confusingly, cause the behavior to look like Java's:

func TestBalanceStrategyRange(t *testing.T) {
	tests := []struct {
		members  map[string][]string
		topics   map[string][]int32
		expected BalanceStrategyPlan
	}{
		{
			members: map[string][]string{"M1": {"T1", "T2"}, "M2": {"T1", "T2"}},
			topics:  map[string][]int32{"T1": {0, 1, 2, 3}, "T2": {0, 1, 2, 3}},
			expected: BalanceStrategyPlan{
				"M1": map[string][]int32{"T1": {0, 1}, "T2": {2, 3}},
				"M2": map[string][]int32{"T1": {2, 3}, "T2": {0, 1}},
			},
		},
                // New test case. Two new consumer groups with same lexicographical order different result
		{
			members: map[string][]string{"M3": {"T1", "T2"}, "M4": {"T1", "T2"}},
			topics:  map[string][]int32{"T1": {0, 1, 2, 3}, "T2": {0, 1, 2, 3}},
			expected: BalanceStrategyPlan{
				"M3": map[string][]int32{"T1": {0, 1}, "T2": {0, 1}},
				"M4": map[string][]int32{"T1": {2, 3}, "T2": {2, 3}},
			},
		},

If it would be problematic to update this strategy to accurately implement the Java library's behavior, it would be very helpful to at least have an alternative strategy that does and some warning about the discrepancy. If that seems reasonable, I'd be happy to contribute a PR in the near future.

@dnwe
Copy link
Collaborator

dnwe commented Jun 6, 2022

@njhartwell 👋🏻 hi and thanks for raising this issue

Yes the intention should always be that the Sarama balance strategies match the Java/librdkafka implementations. We had a similar PR a year or so ago (#1788) which corrected and replaced the implementation of round robin strategy, so I'd be happy to review and merge one which did the same for the range strategy

@njhartwell
Copy link
Contributor Author

Thanks for the quick response @dnwe! Here it is: #2245

@dnwe
Copy link
Collaborator

dnwe commented Jun 13, 2022

Thanks @njhartwell — closed as fixed by #2245 and released in v1.34.1

@dnwe dnwe closed this as completed Jun 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants