Need clarification on Thanos Query deduplication Behaviour #7128

hgsat123 · 2024-02-09T12:06:01Z

hgsat123
Feb 9, 2024

Hi,
Is deduplication functionality of Thanos Query changed since v0.32.3 relese due to fix --> 6697? We see lot of metric gaps in our Grafana graph if we query metrics data for last 3 hrs or 5 or 12 hrs.

Setup

We have multiple EKS clusters spread across us-east-1 & us-west-2 region, where prometheus-operator (which is statefulset with 2 replicas) remote writes metrics of each EKS to our Thanos Receiver (statefulset with multiple replicas). This Prometheus operator has replica external label set pointing to each prometheus operator instance. ex: prometheus-0 & prometheus-1

In Thanos Receiver, as metrics (WAL) written from various EKS clusters, we set unique label (group specific while doing remote-write using write-relabel config at prometheus), & add receive-replica label (thanos receiver instance specific) to each metric before uploaded to backend S3 bucket.

Thanos Query which connects to Receiver instances & sharded Store API used as Datasource to Grafana. In Thanos Query, we currently using --query.replica-label set separately for replica & receive-replica label values to dedup metric queries.
Though query works fine for latest 5, 10, 15, 30 minutes queries which is produced from receiver (which retains latest metrics upto 4 hrs before pushing to S3), issue noticed when we perform query for longer range like last 3 hrs, 6 hrs, 10 hrs. where gaps noticed in metric graph. Means deduped set for a particular metric gets dropped or not completely (PromQL) returned to Grafana. There was no timeout in Grafana Query. We tested all thanos (Receiver, Query & Store) with latest 0.34.0 release as well. Please note, if we disable dedup (remove --query.replica-label - both replica & receive-replica labels to thanos Query), we see all the missing metrics (for certain time range) visible in Grafana Graph.

So, need clarification on how this Deduplication works with the latest fix to #6697 as i noticed "“without_replica_labels:" set with above labels (global replica & receive-replica) in the returned queries of Receiver & store if dedup is set in Thanos Query

hgsat123 · 2024-02-11T08:21:41Z

hgsat123
Feb 11, 2024
Author

Please clarify how deduplication works with the latest changes to Query post 0.32.3 release. Do i need to define "--query.replica-label=replica" in Thanos Query with replica (Global ENV set in promtheus to individual instance (prometheus-0 prometheus-1) values respectively? If i do write_relabel_config of replica (before remote-write) to one common value for "replica" & perform deduplication in Query using same variable, then i see following in the receiver logs (for latest 5 or 15 mins metrics)

matchers:<name:"name" value:"kube_configmap_created" > aggregates:COUNT aggregates:SUM without_replica_labels:"replica" " msg="Series: started fanout streams" status="store LabelSets: {receive="true", receive_replica="thanos-receive-0",

ie global env replica is set to same value (from both prometheus replica during remote-write) in thanos receiver (WAL) & when performed query with dedup variable set on (--query.replica.label=replica ), i see metrics returned from receiver with "without_replica_lables:replica "" (NULL). Is this acceptable behaviour? Need to understand how deduplication works in Query when 2 sets of metrics for prometheus written to Thanos & later queried from Query component

1 reply

MichaHoffmann Feb 18, 2024
Maintainer

I might be wrong but I think that Prometheus HA Pairs writing remote write to a thanos receiver might be a broken setup since the external labels from the HA pair will be come internal labels in the receiver after the remote write, but we can only deduplicate on external labels.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need clarification on Thanos Query deduplication Behaviour #7128

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Need clarification on Thanos Query deduplication Behaviour #7128

hgsat123 Feb 9, 2024

Setup

Replies: 1 comment · 1 reply

hgsat123 Feb 11, 2024 Author

MichaHoffmann Feb 18, 2024 Maintainer

hgsat123
Feb 9, 2024

Replies: 1 comment 1 reply

hgsat123
Feb 11, 2024
Author

MichaHoffmann Feb 18, 2024
Maintainer