Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consumer status is unpredictable when multiple topics are consumed #796

Open
ashi009 opened this issue Jan 9, 2024 · 0 comments
Open

Consumer status is unpredictable when multiple topics are consumed #796

ashi009 opened this issue Jan 9, 2024 · 0 comments

Comments

@ashi009
Copy link

ashi009 commented Jan 9, 2024

for topic, partitions := range topics {
for partitionID, partition := range partitions {
partitionStatus := evaluatePartitionStatus(partition, module.minimumComplete, module.allowedLag)
partitionStatus.Topic = topic
partitionStatus.Partition = int32(partitionID)
partitionStatus.Owner = partition.Owner
partitionStatus.ClientID = partition.ClientID
if partitionStatus.Status > status.Status {
// If the partition status is greater than StatusError, we just mark it as StatusError
if partitionStatus.Status > protocol.StatusError {
status.Status = protocol.StatusError
} else {
status.Status = partitionStatus.Status
}
}
if (status.Maxlag == nil) || (partitionStatus.CurrentLag > status.Maxlag.CurrentLag) {
status.Maxlag = partitionStatus
}
if partitionStatus.Complete == 1.0 {
completePartitions++
}
status.Partitions[count] = partitionStatus
count++
}
}

This piece of code loops over a map of topics, and if the last topic's last partition is reporting ok, the consumer status will be ok.

Given that the map iteration in go is randomized, the consumer status is unpredictable.

The following are the real world effect from this:

  1. The metric from burrow of a consumer when scraping at 2m interval:
    image

  2. The metric from burrow-exporter which requests burrow at 30s interval, and then being scrapped at 2m interval:
    image

The more frequently we query (as burrow uses 30s cache expiration by default), the more likely to see non-OK consumer status.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant