Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

appender: prune old files at startup #2966

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

kaffarell
Copy link
Contributor

@kaffarell kaffarell commented May 10, 2024

The prune_old_logs function only gets called when we rollover the appender. The problem is that at startup we create an initial log file without rolling over, if e.g., the executable then exits before rolling over for the first time, we never call prune_old_logs.

E.g.: If we have a tracing-appender that rolls over every minute and we have a max_files parameter of 2. Running the program (which prints a single log line and exits) every minute, will result in a new log file everytime without deleting the old ones.

Fixes: #2937

The prune_old_logs function only gets called when we rollover the
appender. The problem is that at startup we create an initial log file
without rolling over, if e.g., the executable then exits before rolling
over for the first time, we never call prune_old_logs.

E.g.: If we have a tracing-appender that rolls over every minute and we
have a max_files parameter of 2. Running the program (which prints a
single log line and exits) every minute, will result in a new log file
everytime without deleting the old ones.

Fixes: tokio-rs#2937
@kaffarell kaffarell requested a review from a team as a code owner May 10, 2024 15:36
Copy link

@nfagerlund nfagerlund left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah cool, I've got an app that's affected by this bug — it's behind a scheduling proxy that starts and kills it according to demand, and it never gets to live for an entire log rollover cycle.

User expectations and correctness

So here's the range of cases, with problematic or unexpected behaviors in bold. "Long" lifetime means the app lives for many rotation periods in a row, "short" means it lives for less than a full rotation period.

App lifetime First startup in a Rotation Subsequent startups in same Rotation Awake during rollover
Long (current) Files can grow to max + 1 n/a Files pruned to max
Short (current) Files can grow unbounded Files unchanged n/a
Long (this PR) Files pruned to max n/a Files pruned to max
Short (this PR) Files pruned to max Files pruned to max - 1 n/a

This is because:

  • Current behavior never prunes unless a rollover hits while the app is running.
  • This PR always prunes on startup.
  • The prune_old_logs method unconditionally ensures that there's one fewer than max files, because it assumes it's being guarded by a "should roll over" check and we're about to create a new file and go back to max. (But for an app that might start and die many times within the same Rotation period, that assumption is only correct for the first startup within a given Rotation.)

On consideration, I think it's fine to occasionally prune to max - 1 — after all, it's a maximum, not a guaranteed count. However, this needs an edit to documentation so users can calibrate their expectations — if you must retain N days of logs, then max = N + 1!

Performance concerns

Pruning in Inner::new() will technically work, but it increases the amount of blocking filesystem work that has to be done when creating the writer.

What do you think about deferring the initial prune until it performs its first write after creation? That way, blocking-sensitive applications (like an edge function that needs fast cold startup time) can still take advantage of this crate's NonBlockingWriter to push that pruning cost off the critical path.

It looks like everything's already set up to do that properly, IF Inner::should_rollover happens to return an affirmative on the first check. I think you can provoke this by changing line 531 of Inner::new() to be let next_date = rotation.round_date(&now); — the inner will still start out with the correct writer filename, but refresh_writer will get called on the first write and will immediately re-open that correct file after doing a prune.

This of course will need a comment in the code to explain why the "next" date is starting out in the past.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

max_log_files can‘ work
2 participants