Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"maxFailure (quorum) on a given error family" error: consider different wording #5732

Open
kevinburkesegment opened this issue Jan 18, 2024 · 1 comment

Comments

@kevinburkesegment
Copy link

Recently, we received an error message that included this text, inside of pkg/ring/batch.go. Based on this wording and other contextual info, we assumed that the error was internal to our Prometheus instance, possibly related to having too few nodes to reach quorum. We then began to investigate cluster health, which took a long time and turned out to be a red herring, since we don't manage these Prometheus nodes ourselves, this took lots of effort.

It turns out instead we had a client sending out of error metrics, and all this error message was trying to say was that several different Prometheus nodes agreed that this was a problem.

How about rewording this error message to say e.g. "data write failed by consensus" ?

@yeya24
Copy link
Collaborator

yeya24 commented Jan 28, 2024

Hi @kevinburkesegment, honestly I don't see much difference between maxFailure (quorum) on a given error family and data write failed by consensus. They are mostly the same thing.

Recently, we received an error message that included this text, inside of pkg/ring/batch.go

Do you manage Cortex and you don't manage those Prometheus instances that send data to Cortex cluster? Would love to hear more how this error message impacts the user experience

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants