Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache file names in fill_todo #144

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

osiewicz
Copy link

@osiewicz osiewicz commented Apr 27, 2024

Background:
While working with cargo, I've noticed that it takes ~30s to cargo clean -p with large enough target directory (~200GB). Under a profiler, it turned out that most of the time was spent retrieving paths for removal via glob::glob in rm_rf_glob (and not actually removing the files).
image

Change description:
In call to .sort_by, we repetitively parse the paths to obtain file names for comparison. This commit caches file names in PathWrapper object, akin to #135 that did so for dir info.

For my use case, a cargo build using that branch takes ~14s to clean files instead of previous 30s (I've measured against main branch of this repository, to account for changes made since glob 0.3.1). Still not ideal, but hey, we're shaving 50% of time off for a bit heavier memory use.

Background:
While working with cargo, I've noticed that it takes ~30s to cargo clean -p with large enough target directory (~200GB).
With a profiler, it turned out that most of the time was spent retrieving paths for removal in https://github.com/rust-lang/cargo/blob/eee4ea2f5a5fa1ae184a44675315548ec932a15c/src/cargo/ops/cargo_clean.rs#L319 (and not actually removing the files).

Change description:
In call to .sort_by, we repetitively parse the paths to obtain file names for comparison. This commit caches file names in PathWrapper object, akin to rust-lang#135 that did so for dir info.

For my use case, a cargo build using that branch takes ~14s to clean files instead of previous 30s (I've measured against main branch of this directory, to account for changes made since 0.3.1). Still not ideal, but hey, we're shaving 50% of time off for a bit heavier memory use.
bors added a commit to rust-lang/cargo that referenced this pull request May 2, 2024
…ihanglo

Clean package perf improvements

### What does this PR try to resolve?

I've noticed that `cargo clean -p` execution time scales poorly with size of target directory; in my case (~250GB target directory on M1 Mac) running `cargo clean -p` takes circa 35 seconds. Notably, it's the file listing that takes that time, not deleting the package itself. That is, when running `cargo clean -p SOME_PACKAGE` twice in a row, both executions take roughly the same time.

I've tracked it down to the fact that we seem quite happy to use `glob::glob` function, which iterates over contents of target dir. It also was a bit sub-optimal when it came to doing that, for which I've already filled a PR in rust-lang/glob#144 - that PR alone takes down cleaning time down to ~14 seconds. While it is a good improvement for a relatively straightforward change, this PR tries to take it even further. With glob PR applied + changes from this PR, my test case goes down to ~6 seconds. I'm pretty sure that we could squeeze this further, but I'd rather do so in a follow-up PR.

Notably, this PR doesn't help with *just* super-large target directories. `cargo clean -p serde` on cargo repo (with ~7Gb target directory size) went down from ~380ms to ~100ms for me. Not too shabby.

### How should we test and review this PR?

I've mostly tested it manually, running `cargo clean` against multiple different repos.

### Additional information
TODO:
- [x] [c770700](c770700) is not quite correct; we need to consider that it changes how progress reporting works; as is, we're gonna report all progress relatively quickly and stall at the end (when we're actually iterating over directories, globbing, removing files, that kind of jazz). I'll address that.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant