Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DATA LOSS: thanos-compact deduplication is experimental and should not be enabled by default #290

Open
jonasmatthias opened this issue Dec 22, 2022 · 0 comments

Comments

@jonasmatthias
Copy link

jonasmatthias commented Dec 22, 2022

Deduplication in thanos-compact should not be enabled by default because it is an experimental feature. The example configuration in kube-thanos enables offline deduplication in thanos-compact on Prometheus replicas but does not set the correct deduplication strategy. This leads to data loss as deduplication is irreversible.

The documentation explains

This is a common case when Prometheus HA replicas are used. You can enable this deduplication strategy via the --deduplication.func=penalty flag.

The description of the deduplication.replica-label flag in the code also clarifies that the default deduplication algorithm should NOT be used on HA prometheus replicas:

Label to treat as a replica indicator of blocks that can be deduplicated (repeated flag). This will merge multiple replica blocks into one. This process is irreversible. Experimental. When one or more labels are set, compactor will ignore the given labels so that vertical compaction can merge the blocks. Please note that by default this uses a NAIVE algorithm for merging which works well for deduplication of blocks with precisely the same samples like produced by Receiver replication. If you need a different deduplication algorithm (e.g one that works well with Prometheus replicas), please set it via --deduplication.func.

I learned about this via

Since #164 offline deduplication in the compactor is enabled by default on label prometheus_replica. But the flag --deduplication.func=penalty is not set.

- --deduplication.replica-label=prometheus_replica
- --deduplication.replica-label=rule_replica

It might be better to deactivate offline deduplication by default because it is an experimental feature.

@jonasmatthias jonasmatthias changed the title thanos-compact deduplication on prometheus_replica requires deduplication.func=penalty data loss: thanos-compact deduplication is experimental and should not be enabled by default Feb 13, 2023
@jonasmatthias jonasmatthias changed the title data loss: thanos-compact deduplication is experimental and should not be enabled by default DATA LOSS: thanos-compact deduplication is experimental and should not be enabled by default Feb 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant