Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a tool to see what user code is causing rustc to use lots of time #51

Open
wesleywiser opened this issue Aug 15, 2019 · 15 comments
Labels
enhancement New feature or request T-measureme Related to the core measureme library T-summarize Related to the summarize tool

Comments

@wesleywiser
Copy link
Member

We should have a way to calculate how long each item in your codebase takes to compile. Conceptually this would work by grouping the measureme data by DefId/NodeId or some other identifier rather than by the query name. We can then output a sorted list of items by how long each takes.

For example, the output might look like this:

$ summarize code-profile /path/to/rustc-results

Total time in rustc: 120 seconds

----------------------------------------
| % time | Item                        |
| ------ | -----------------------------
| 20.4%  | example::foo::bar::clone()  |
| 10.2%  | example::baz::widget::bla() |

(more rows)

This will require changes to the profiling code in rustc to record DefIds or NodeIds for each query.

@wesleywiser wesleywiser added enhancement New feature or request T-summarize Related to the summarize tool T-measureme Related to the core measureme library labels Aug 15, 2019
@michaelwoerister
Copy link
Member

Some more notes here: Once we record query-keys and function parameters we basically have the whole query graph in our trace. That lets us do some quite sophisticated analysis. However, there are also some limitations:

  • A big part of compilation is spent in LLVM and (at least right now) we can only measure things a the codegen-unit level. That means we know how long it takes to optimize the whole object file but we can't tell how that time is divided up between the individual functions in there. But maybe LLVM has some facilities to get more information.
  • Because query results are cached, it can be tricky to attribute compile time to a single function. E.g. if both function foo() and function bar() trigger computing the type info for struct Foo then where do you count the time it took to compute Foo's type info?
  • On the other hand, given the query graph it is easy to find things that have been triggered by only one function and one could also do a "simulation/approximation" of what compile times would look like if a given function would be deleted from the code base. (At least for the single-threaded case)

@Eh2406
Copy link

Eh2406 commented Sep 20, 2019

Some thoughts from an outsider, worth exactly what you paid for.

A big part of compilation is spent in LLVM

I would ges that llvm-ir is a pretty good measure of proportion of time. So a tool to tell me, for each line of code in some file, how much llvm-ir is transitively generated, soudes very useful. I also think this may make it possible for non-rustc devs to find rust-lang/rust#64572 by saying "why is iter.all taking 1/2 of my generated llvm-ir?"

Because query results are cached, it can be tricky to attribute compile time to a single function.

I think there are 2 answers to this that may be worth looking at. One is averaged out (time to compute cached results)/(number of times it is used). This is useful for aggregating, as it sums up to the real total time. The other is double count, anything that uses a cached result gets the full charge time to compute the result. The sum does not make sense, but it helps make it clear what is expensive to compile.

"simulation/approximation"

The coz profiler tries to get an estimate of if "I speed up this part by x% how much with the total process get faster?". I have never used it, but it sounds interesting.

@alexcrichton
Copy link
Member

Carrying over some discussion from the forums, I agree with you @wesleywiser that it's a tricky question here and we can erroneously blame the wrong code, but I also agree with @Eh2406 that the sums may not end up mattering too much in practice.

For the problem you brought up in the forums, I'd actually expect that layout_of(Vec<u8>) -> TyLayout would get "blamed" to Vec itself rather than whomever requested Vec. It may be a little harder at that point to go through and actually figure out who requested Vec<u8> so much, but that's actually a lot better than what we have today anyway :)

An example of what I'd love to get from this is that I recently changed the jobserver crate to stop using std::sync::mpsc for synchronization, and instead use Mutex and Condvar. That shaved about 33% of the crate's compile time because std::sync::mpsc brings in so much code to monomorphize.

For this sort of question ("what in my code is taking so long to compile") I don't think we necessarily need precise and 100% accurate information, just seeing a huge amount of stuff pointing at std::sync::mpsc would have been a great head start in tracking things down.

Another example for this is the discovery of rust-lang/rust#64140 which only really happened by accident. But I had no idea that using Command was bringing in thousands of lines of IR, so some sort of "blame" tool, even if it was pretty coarse, would have set off the alarms of where to start looking.

@nikomatsakis
Copy link

I was discussing this a bit with @michaelwoerister. My main thought here, which perhaps backs up @alexcrichton's point:

I think a common problem is excessive generics creation. It'd be great if we could measure the amount of LLVM code generated by various bits of your rustc source. I'm not sure if the self-profile tool can be adapted to this or not, it's obviously a bit beyond the original spec of measuring runtime. I suppose that being able to list all the things instantiated within a CGU (by size) would also be a decent tool, since then you could say "ah I see that optimizing this CGU is taking a long time, what's in there?" and that might guide you to the problem.

@alexcrichton
Copy link
Member

Actually one thing that I think would be really cool is to have a diagram like this..

for LLVM IR. The first ring around the center could be crate, and then each layer deeper could be a module, eventually getting to individual functions. I feel like that alone would be very informative to just say "oh wow serde is 90% of my crate?!" and going from there.

Now it's true though that the amount of LLVM code is only a proxy for the number we care about, compile time, so it's not perfect.

@bjorn3
Copy link
Member

bjorn3 commented Sep 24, 2019

Sunbursts have the disadvantage that your eye sees the area of a section, but the angle is what actually matters, so sections at outside look bigger than sections near the center. I prefer flamegraphs.

The first ring around the center could be crate, and then each layer deeper could be a module, eventually getting to individual functions.

+1

@michaelwoerister
Copy link
Member

It'd be great if we could measure the amount of LLVM code generated by various bits of your rustc source

Huh, we could even record events in the monomorphization collector and CGU partitioner and then one could know exactly where instantiation happens.

@wesleywiser
Copy link
Member Author

Looks like clang just got a similar feature https://www.snsystems.com/technology/tech-blog/clang-time-trace-feature

@andjo403
Copy link
Contributor

I was hacking in the llvm support in rustc yesterday to see what was possible to see but I do not know if chrome tracing was best format, also it was only possible to use -C codegen-units=1 due to the llvm code was not threadsafe. but it was fun to see some info of where the time was spent in llvm per function

@michaelwoerister
Copy link
Member

Nice! This looks like it should be able to provide the information we want. @andjo403, there's also -Z no-parallel-llvm which should help with the thread safety problem too. It's a shame that the tooling doesn't seem to be thread safe though.

@andjo403
Copy link
Contributor

andjo403 commented Oct 4, 2019

have made an POC with adding the clang time-trace feature in the self-profiling but do not know if it is something that can be up streamed can find the code here https://github.com/andjo403/rust/tree/llvmProfiler

@michaelwoerister
Copy link
Member

@andjo403 Cool. We'll probably come back to this some time later this year.

@luser
Copy link

luser commented Nov 1, 2019

I think it'd be great to figure out a way to integrate any work here with cargo's -Z timings feature. I tried that out on a crate of mine and noticed that one of my sub-crates spends a lot of time compiling before codegen. It'd be great to be able to build with rustc profiling enabled so I could drill down in that graph to get an idea of where all that time is being spent.

@kettle11
Copy link

I spent a number of hours yesterday removing parts of my codebase in an effort to track down a compile time increase.

A tool like this would save a massive amount of time, and make it easier for crate creators to make informed decisions.

I’d very much like to see progress made here!

@nnethercote
Copy link

nnethercote commented May 15, 2020

I recently was told about cargo-llvm-lines. It tells you how many lines of LLVM IR is produced for each function/method, and how many times each function/method is instantiated. (The latter number can get surprisingly high for generic functions/methods.)

It's great. I've already used it for rust-lang/rust#72013, rust-lang/rust#72166, and rust-lang/rust#72139. Although the number of lines of LLVM IR code is only a proxy for the time taken to compile the code, it's good enough in practice to give useful results. As Alex said above, rough information is much better than no information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request T-measureme Related to the core measureme library T-summarize Related to the summarize tool
Projects
None yet
Development

No branches or pull requests

10 participants