Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prompb: Add histograms to remote write/read protobufs #10870

Merged
merged 1 commit into from Jun 29, 2022

Conversation

beorn7
Copy link
Member

@beorn7 beorn7 commented Jun 15, 2022

This is only for the sparsehistogram branch!

A first attempt to add histograms to remote write/read protobufs.

@bboreham again cc'ing you to double-check if this makes sense.

Copy link
Member

@codesome codesome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With my limited knowledge on protos, LGTM

prompb/types.proto Outdated Show resolved Hide resolved
@beorn7
Copy link
Member Author

beorn7 commented Jun 19, 2022

Thanks, @codesome . Still waiting for remote write folks to comment.

Note that prometheus/client_model#58 has changes to the exposition format protobuf, adding support for float and gauge histograms, with some possibly enlightening comments.

Copy link
Member

@bboreham bboreham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No objections from me.
I do think you should link to the design (which I think is here.

Also mention / link to the explanation for "float histograms".

Comment on lines +91 to +107
sint32 offset = 1; // Gap to previous span, or starting point for 1st span (which can be negative).
uint32 length = 2; // Length of consecutive buckets.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I note these are optional in client_model, and I couldn't figure out why.
If the answer is other than "mistake", please explain both cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that's mostly historical. The proto spec in prometheus/client_model is using the proto2 syntax. Back then, protobuf had optional and non-optional fields. There were different schools of thought. Some considered optional fields harmful, some considered non-optional fields harmful. The Prometheus exposition protobuf was created following the latter school of thought, according to which you declare all fields optional, even if they aren't really optional.

proto3 resolved the schism by (arguably) declaring the "make everything optional" school as the winner. In proto3, all fields are implicitly optional. If they are missing, the default value is filled in, and (on the API level) you cannot actually distinguish between "unset" and "explicitly specified the value that happens to be the default value". If you really want an "unset" state, you have to declare a field as repeated but only care about zero or one repeats.

If the above is inaccurate, please pardon me. My context about protobuf is quite incomplete.

repeated Span span = 1;
// Only one of "delta" or "count" may be used, the former for regular
// histograms with integer counts, the latter for float histograms.
repeated sint64 delta = 2; // Count delta of each bucket compared to previous one (or to zero for 1st bucket).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, logically these values belong with the Span, but as an optimisation to reduce Protobuf metadata (tag and length) you have pulled them up one level and concatenated them, so the encoder/decoder must do a little extra work.
Either way, I think it would be worth some more comments so readers can understand.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are quite right. I'll add more comments.

For the decoder, it's actually less work, because the protobuf layout happens to be the same as the layout in TSDB. (Which isn't a reason to do it that way on its own, of course, just a convenient coincidence.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For remote write, it's actually the TSDB layout on both sides, so it's even less work for the encoder… (Encoding is more costly for exposition, depending on the way the instrumentation library manages the buckets internally.)

Copy link
Member Author

@beorn7 beorn7 Jun 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For more context: In TSDB, the layout of buckets (i.e. the set of Spans) is even the same across a whole chunk (currently up to 120 histograms). That's where specifying the spans separately (and only once) makes a huge dent. I was considering doing the same for all samples in one batch of remote-write, but decided against it for now, mostly for the following reason.

  1. Batching multiple samples of the same series into one remote-write request isn't happening that often, and even if it does, it is more like a handful of samples, not ~100. Therefore, this optimization doesn't really buy us a lot, while it comes with the cost of a leaky abstraction and more complex remote-write code (see next item).
  2. While consecutive samples of the same series are likely to come from the same chunk (and therefore have the same bucket layout), it's not guaranteed (or in other words: remote-write doesn't know anything about TSDB chunks, and it also shouldn't know anything about it). The remote-write code needed to reconcile different bucket layouts (by inserting empty buckets where needed) if they happen to end up in the same batch. (TSDB is doing exactly that reconciliation, but that's because there is a huge return for this investment, see above.) Inversely, there isn't any guarantee, either, that a remote-write receiver will put samples from the same batch into the same chunk (if it is a chunk-based storage at all).

@beorn7
Copy link
Member Author

beorn7 commented Jun 19, 2022

Also mention / link to the explanation for "float histograms".

I think the already mentioned prometheus/client_model#58 is a good starting point to understand the need for "float histograms". I expect the most common use case for remote write will be to send the result of recording rules to the remote storage. (If you rate a histogram, you inevitably get a float histogram as a result.)

@beorn7 beorn7 force-pushed the beorn7/protobuf branch 2 times, most recently from 05f89d3 to 3dac538 Compare June 21, 2022 11:53
@beorn7
Copy link
Member Author

beorn7 commented Jun 21, 2022

I added two more things:

  1. Utilized oneof (a proto3 feature) to mark the various one-of fields more explicitly. (Note that oneof isn't compatible with repeated, so for count vs delta in Buckets, we still depend on the user to only use one or the other.)
  2. Added the HISTOGRAM chunk encoding, so that this should in principle be ready for chunked remote-read. (Note that STREAMED_XOR_CHUNKS turns out to be a bad name. Given that the Chunk message has an Encoding type field, it was always ready to contain non-XOR chunks, so the name in the enum should be simply STREAMED_CHUNKS. Not sure about the implication of changing it now, so I left the old name in place.)

@beorn7
Copy link
Member Author

beorn7 commented Jun 29, 2022

I have now also added a link to the design doc in the doc comments, including a pointer to the concept of float histograms.

This is just for the sparsehistogram branch, and it got a few looks already. I'm merging it now, but we can still iterate on it. Whoever works on implementing the actual use of these proto messages (myself, @cstyan, @csmarchbanks, maybe even @LeviHarrison), feel free to come back to the proto spec and adjust it to needs and insights found later.

Signed-off-by: beorn7 <beorn@grafana.com>
@beorn7 beorn7 merged commit a97b64c into sparsehistogram Jun 29, 2022
@beorn7 beorn7 deleted the beorn7/protobuf branch June 29, 2022 16:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants