You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Recently, we received an error message that included this text, inside of pkg/ring/batch.go. Based on this wording and other contextual info, we assumed that the error was internal to our Prometheus instance, possibly related to having too few nodes to reach quorum. We then began to investigate cluster health, which took a long time and turned out to be a red herring, since we don't manage these Prometheus nodes ourselves, this took lots of effort.
It turns out instead we had a client sending out of error metrics, and all this error message was trying to say was that several different Prometheus nodes agreed that this was a problem.
How about rewording this error message to say e.g. "data write failed by consensus" ?
The text was updated successfully, but these errors were encountered:
Hi @kevinburkesegment, honestly I don't see much difference between maxFailure (quorum) on a given error family and data write failed by consensus. They are mostly the same thing.
Recently, we received an error message that included this text, inside of pkg/ring/batch.go
Do you manage Cortex and you don't manage those Prometheus instances that send data to Cortex cluster? Would love to hear more how this error message impacts the user experience
Recently, we received an error message that included this text, inside of pkg/ring/batch.go. Based on this wording and other contextual info, we assumed that the error was internal to our Prometheus instance, possibly related to having too few nodes to reach quorum. We then began to investigate cluster health, which took a long time and turned out to be a red herring, since we don't manage these Prometheus nodes ourselves, this took lots of effort.
It turns out instead we had a client sending out of error metrics, and all this error message was trying to say was that several different Prometheus nodes agreed that this was a problem.
How about rewording this error message to say e.g. "data write failed by consensus" ?
The text was updated successfully, but these errors were encountered: