Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default --storage.tsdb.retention.time #13967

Open
jonathan-dev opened this issue Apr 22, 2024 · 6 comments
Open

Default --storage.tsdb.retention.time #13967

jonathan-dev opened this issue Apr 22, 2024 · 6 comments

Comments

@jonathan-dev
Copy link

What did you do?

I was running prom/prometheus:v2.37.6

What did you expect to see?

--storage.tsdb.retention.time is said to default to 15 documented here : https://prometheus.io/docs/prometheus/latest/storage/#operational-aspects

What did you see instead? Under which circumstances?

The flags I see at /flags show --storage.tsdb.retention.time to be 0s

System information

No response

Prometheus version

No response

Prometheus configuration file

No response

Alertmanager version

No response

Alertmanager configuration file

No response

Logs

No response

@machine424
Copy link
Collaborator

Thanks for reporting this. /flags returns the flags the user has set and the defaults as known to kingpin (the flags parser we use)

boilerplateFlags := kingpin.New("", "").Version("")
for _, f := range a.Model().Flags {
if boilerplateFlags.GetFlag(f.Name) != nil {
continue
}
cfg.web.Flags[f.Name] = f.Value.String()
}

For --storage.tsdb.retention.time, the default value we inform kingpin about is indeed 0s:

serverOnlyFlag(a, "storage.tsdb.retention.time", "How long to retain samples in storage. When this flag is set it overrides \"storage.tsdb.retention\". If neither this flag nor \"storage.tsdb.retention\" nor \"storage.tsdb.retention.size\" is set, the retention time defaults to "+defaultRetentionString+". Units Supported: y, w, d, h, m, s, ms.").
SetValue(&newFlagRetentionDuration)

as the zero value of
newFlagRetentionDuration model.Duration

15d is only used when neither --storage.tsdb.retention.time nor --storage.tsdb.retention.size is set. It's done here:

if cfg.tsdb.RetentionDuration == 0 && cfg.tsdb.MaxBytes == 0 {
cfg.tsdb.RetentionDuration = defaultRetentionDuration
level.Info(logger).Log("msg", "No time or size retention was set so using the default time retention", "duration", defaultRetentionDuration)
}

outside of kingpin (as it doesn’t seem to support such advanced dependencies/conditions)

What we can do, is make https://prometheus.io/docs/prometheus/latest/storage/#operational-aspects as explicit as the description in

newFlagRetentionDuration model.Duration
and maybe add a sentence on the /flags page about how we consider the defaults.


We could set newFlagRetentionDuration as well to defaultRetentionDuration in

if cfg.tsdb.RetentionDuration == 0 && cfg.tsdb.MaxBytes == 0 {
cfg.tsdb.RetentionDuration = defaultRetentionDuration
level.Info(logger).Log("msg", "No time or size retention was set so using the default time retention", "duration", defaultRetentionDuration)
}
to make kingpin know about the "real/final" default, but I don't think it's a good idea.

@jonathan-dev
Copy link
Author

Thanks a lot for the quick and detailed answer. The reason I noticed this behavior was because the cloudflare pint tool was complaining about my retention time being 0s and thus being shorter than some of the ranges that I was querying (like 2m). After that I was a bit confused whats going on because when I was running a query it seemed like that there was a history kept but I was not able to validate that it was the 15days claimed or just the cleanup running infrequently. After checking I just noticed that the /status endpoint shows me the 15d retention time when I do not use the flags which is kind of what I was looking for. Noting it under the operational aspects probably would also helpful I think.

@machine424
Copy link
Collaborator

Let's start by making https://prometheus.io/docs/prometheus/latest/storage/#operational-aspects as clera as

serverOnlyFlag(a, "storage.tsdb.retention.time", "How long to retain samples in storage. When this flag is set it overrides \"storage.tsdb.retention\". If neither this flag nor \"storage.tsdb.retention\" nor \"storage.tsdb.retention.size\" is set, the retention time defaults to "+defaultRetentionString+". Units Supported: y, w, d, h, m, s, ms.").
SetValue(&newFlagRetentionDuration)
then.

@tesla59
Copy link
Contributor

tesla59 commented Apr 23, 2024

Let's start by making https://prometheus.io/docs/prometheus/latest/storage/#operational-aspects as clera as

serverOnlyFlag(a, "storage.tsdb.retention.time", "How long to retain samples in storage. When this flag is set it overrides \"storage.tsdb.retention\". If neither this flag nor \"storage.tsdb.retention\" nor \"storage.tsdb.retention.size\" is set, the retention time defaults to "+defaultRetentionString+". Units Supported: y, w, d, h, m, s, ms.").
SetValue(&newFlagRetentionDuration)

then.

Hi @machine424,
I have updated the doc as mentioned by u in the PR #13982
I'd like to further work on this task. Will work on this and let u know

@tesla59
Copy link
Contributor

tesla59 commented May 1, 2024

Hi @machine424
How should this issue be further proceeded?

@jonathan-dev
Copy link
Author

I think it would have also helped me to document the strange behavior that the /status page is showing 0 as default although it is the 15 days somewhere like https://prometheus.io/docs/prometheus/latest/storage/#operational-aspects or the /status page itself. What are your thoughts on that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants