Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add post-mortem for avoid global sort and move it #7130

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
@@ -1,9 +1,9 @@
---
type: proposal
title: Avoid Global Sort on Querier Select
status: approved
status: reverted
owner: bwplotka,fpetkovski
menu: proposals-accepted
menu: proposals-done
---

## Avoid Global Sort on Querier Select
Expand Down Expand Up @@ -186,3 +186,29 @@ set := &promSeriesSet{
warns: warns,
}
```

## Post-mortem of proposal

We implemented an early version of this but immediately ran into correctness issues. The root of the problem, which resulted in inaccurate query responses, was that removing (or adding) labels to a set of labelsets potentially scrambled the order. This affected deduplication since it depends on receiving the series in an organized sequence. In simpler terms, to accurately duplicate, we need to be aware at all times if we have received all replicas for a given labelset; thus, deduplication only functions properly on organized series sets.

Let's consider we have the following series labels:

```
a=1,b=1
a=1,b=2
a=2,b=1
a=2,b=2
```

If the replica label is `a`, then the response transforms into:

```
b=1
b=2
b=1
b=2
```

Theoretically, we could address this in the distant future by modifying the ordering rules within the TSDB, but I'm uncertain if that will ever materialize.

Nevertheless, despite the initial challenges faced in implementation and subsequent reverting, we have made significant improvements to the querying internals. For example, we now use a ProxyHeap in a multitude of components, rather than repeating the same logic across each individual component.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean we are still avoiding global sort by stripping out replica labels on stores, we just have to perform resorting in stores afterwards, but the sorting is now not done in global context but on the store APIs right? So its not full revert I think.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mhm, why do we https://github.com/thanos-io/thanos/blob/main/pkg/store/proxy_heap.go#L757-L761 force resorting here? Shouldn't we check whether the server supports without replica labels?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good question, stores should always return sorted responses.

Edit: Ah, I added that, its because this is used for block clients in bucket store. We remove labels in the block clients and then put them onto a proxy heap; if we wouldnt resort there we would have the same issues we have in querier in the bucket store because it acts like a proxy heap for blocks. So we need to make them return sorted responses after removal of the external labels. We cannot rely on sorted responses there. Maybe it would have been more elegant to do the sorting in the block series client because technically it is the one violating the sorting constraint for Store APIs I think.