prompb: Add histograms to remote write/read protobufs #10870

beorn7 · 2022-06-15T15:25:42Z

This is only for the sparsehistogram branch!

A first attempt to add histograms to remote write/read protobufs.

@bboreham again cc'ing you to double-check if this makes sense.

codesome

With my limited knowledge on protos, LGTM

prompb/types.proto

beorn7 · 2022-06-19T10:58:51Z

Thanks, @codesome . Still waiting for remote write folks to comment.

Note that prometheus/client_model#58 has changes to the exposition format protobuf, adding support for float and gauge histograms, with some possibly enlightening comments.

bboreham

No objections from me.
I do think you should link to the design (which I think is here.

Also mention / link to the explanation for "float histograms".

bboreham · 2022-06-19T15:02:16Z

prompb/types.proto

+    sint32 offset = 1; // Gap to previous span, or starting point for 1st span (which can be negative).
+    uint32 length = 2; // Length of consecutive buckets.


I note these are optional in client_model, and I couldn't figure out why.
If the answer is other than "mistake", please explain both cases.

I believe that's mostly historical. The proto spec in prometheus/client_model is using the proto2 syntax. Back then, protobuf had optional and non-optional fields. There were different schools of thought. Some considered optional fields harmful, some considered non-optional fields harmful. The Prometheus exposition protobuf was created following the latter school of thought, according to which you declare all fields optional, even if they aren't really optional.

proto3 resolved the schism by (arguably) declaring the "make everything optional" school as the winner. In proto3, all fields are implicitly optional. If they are missing, the default value is filled in, and (on the API level) you cannot actually distinguish between "unset" and "explicitly specified the value that happens to be the default value". If you really want an "unset" state, you have to declare a field as repeated but only care about zero or one repeats.

If the above is inaccurate, please pardon me. My context about protobuf is quite incomplete.

bboreham · 2022-06-19T15:13:41Z

prompb/types.proto

+  repeated Span span = 1;
+  // Only one of "delta" or "count" may be used, the former for regular
+  // histograms with integer counts, the latter for float histograms.
+  repeated sint64 delta = 2; // Count delta of each bucket compared to previous one (or to zero for 1st bucket).


If I understand correctly, logically these values belong with the Span, but as an optimisation to reduce Protobuf metadata (tag and length) you have pulled them up one level and concatenated them, so the encoder/decoder must do a little extra work.
Either way, I think it would be worth some more comments so readers can understand.

I think you are quite right. I'll add more comments.

For the decoder, it's actually less work, because the protobuf layout happens to be the same as the layout in TSDB. (Which isn't a reason to do it that way on its own, of course, just a convenient coincidence.)

For remote write, it's actually the TSDB layout on both sides, so it's even less work for the encoder… (Encoding is more costly for exposition, depending on the way the instrumentation library manages the buckets internally.)

For more context: In TSDB, the layout of buckets (i.e. the set of Spans) is even the same across a whole chunk (currently up to 120 histograms). That's where specifying the spans separately (and only once) makes a huge dent. I was considering doing the same for all samples in one batch of remote-write, but decided against it for now, mostly for the following reason.

Batching multiple samples of the same series into one remote-write request isn't happening that often, and even if it does, it is more like a handful of samples, not ~100. Therefore, this optimization doesn't really buy us a lot, while it comes with the cost of a leaky abstraction and more complex remote-write code (see next item).

While consecutive samples of the same series are likely to come from the same chunk (and therefore have the same bucket layout), it's not guaranteed (or in other words: remote-write doesn't know anything about TSDB chunks, and it also shouldn't know anything about it). The remote-write code needed to reconcile different bucket layouts (by inserting empty buckets where needed) if they happen to end up in the same batch. (TSDB is doing exactly that reconciliation, but that's because there is a huge return for this investment, see above.) Inversely, there isn't any guarantee, either, that a remote-write receiver will put samples from the same batch into the same chunk (if it is a chunk-based storage at all).

beorn7 · 2022-06-19T16:30:59Z

Also mention / link to the explanation for "float histograms".

I think the already mentioned prometheus/client_model#58 is a good starting point to understand the need for "float histograms". I expect the most common use case for remote write will be to send the result of recording rules to the remote storage. (If you rate a histogram, you inevitably get a float histogram as a result.)

beorn7 · 2022-06-21T11:57:15Z

I added two more things:

Utilized oneof (a proto3 feature) to mark the various one-of fields more explicitly. (Note that oneof isn't compatible with repeated, so for count vs delta in Buckets, we still depend on the user to only use one or the other.)
Added the HISTOGRAM chunk encoding, so that this should in principle be ready for chunked remote-read. (Note that STREAMED_XOR_CHUNKS turns out to be a bad name. Given that the Chunk message has an Encoding type field, it was always ready to contain non-XOR chunks, so the name in the enum should be simply STREAMED_CHUNKS. Not sure about the implication of changing it now, so I left the old name in place.)

beorn7 · 2022-06-29T15:28:34Z

I have now also added a link to the design doc in the doc comments, including a pointer to the concept of float histograms.

This is just for the sparsehistogram branch, and it got a few looks already. I'm merging it now, but we can still iterate on it. Whoever works on implementing the actual use of these proto messages (myself, @cstyan, @csmarchbanks, maybe even @LeviHarrison), feel free to come back to the proto spec and adjust it to needs and insights found later.

Signed-off-by: beorn7 <beorn@grafana.com>

beorn7 requested review from cstyan, csmarchbanks and codesome June 15, 2022 15:25

codesome approved these changes Jun 17, 2022

View reviewed changes

prompb/types.proto Outdated Show resolved Hide resolved

beorn7 force-pushed the beorn7/protobuf branch from c56385e to 46a55d0 Compare June 19, 2022 10:57

bboreham reviewed Jun 19, 2022

View reviewed changes

beorn7 force-pushed the beorn7/protobuf branch 2 times, most recently from 05f89d3 to 3dac538 Compare June 21, 2022 11:53

beorn7 force-pushed the beorn7/protobuf branch from 3dac538 to 6975734 Compare June 29, 2022 15:25

prompb: Add histograms to remote write/read protobufs

d9d51c5

Signed-off-by: beorn7 <beorn@grafana.com>

beorn7 force-pushed the beorn7/protobuf branch from 6975734 to d9d51c5 Compare June 29, 2022 16:01

beorn7 merged commit a97b64c into sparsehistogram Jun 29, 2022

beorn7 deleted the beorn7/protobuf branch June 29, 2022 16:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prompb: Add histograms to remote write/read protobufs #10870

prompb: Add histograms to remote write/read protobufs #10870

beorn7 commented Jun 15, 2022 •

edited

codesome left a comment

beorn7 commented Jun 19, 2022

bboreham left a comment

bboreham Jun 19, 2022

beorn7 Jun 19, 2022

bboreham Jun 19, 2022

beorn7 Jun 19, 2022

beorn7 Jun 19, 2022

beorn7 Jun 21, 2022 •

edited

beorn7 commented Jun 19, 2022

beorn7 commented Jun 21, 2022

beorn7 commented Jun 29, 2022

		sint32 offset = 1; // Gap to previous span, or starting point for 1st span (which can be negative).
		uint32 length = 2; // Length of consecutive buckets.

prompb: Add histograms to remote write/read protobufs #10870

prompb: Add histograms to remote write/read protobufs #10870

Conversation

beorn7 commented Jun 15, 2022 • edited

codesome left a comment

Choose a reason for hiding this comment

beorn7 commented Jun 19, 2022

bboreham left a comment

Choose a reason for hiding this comment

bboreham Jun 19, 2022

Choose a reason for hiding this comment

beorn7 Jun 19, 2022

Choose a reason for hiding this comment

bboreham Jun 19, 2022

Choose a reason for hiding this comment

beorn7 Jun 19, 2022

Choose a reason for hiding this comment

beorn7 Jun 19, 2022

Choose a reason for hiding this comment

beorn7 Jun 21, 2022 • edited

Choose a reason for hiding this comment

beorn7 commented Jun 19, 2022

beorn7 commented Jun 21, 2022

beorn7 commented Jun 29, 2022

beorn7 commented Jun 15, 2022 •

edited

beorn7 Jun 21, 2022 •

edited