Evaluate using Profile-Guided Optimization (PGO) #1689

zamazan4ik · 2024-05-04T19:36:25Z

Hi!

I just read your article about optimizing the graphql-lint performance. Since I tested one specific compiler optimization - Profile-Guided Optimization (PGO) - on various projects with positive results (you can find all benchmarks here: https://github.com/zamazan4ik/awesome-pgo), I decided to test the optimization with graphql-lint.

Test environment

Fedora 39
Linux kernel 6.8.7
AMD Ryzen 9 5900x
48 Gib RAM
SSD Samsung 980 Pro 2 Tib
Compiler - Rustc 1.78
The project version: the latest for now from the main branch on commit 5605d62f69790f62a385e8155bddf838f977165b
Disabled Turbo boost

Benchmark

For benchmark purposes, I use built-in into the project benchmarks. For PGO optimization I use cargo-pgo tool. Release bench result I got with cargo bench command. The PGO training phase is done with cargo pgo bench, PGO optimization phase - with cargo pgo optimize bench.

All measurements are done on the same machine, with the same background "noise" (as much as I can guarantee).

Results

I got the following results:

PGO optimized compared to Release:

     Running benches/benchmark.rs (/home/zamazan4ik/open_source/grafbase/target/x86_64-unknown-linux-gnu/release/deps/benchmark-107057c07804eda6)
Benchmarking lint schema
Benchmarking lint schema: Warming up for 3.0000 s
Benchmarking lint schema: Collecting 100 samples in estimated 5.0165 s (162k iterations)
Benchmarking lint schema: Analyzing
lint schema             time:   [30.072 µs 30.093 µs 30.110 µs]
                        change: [-20.761% -20.644% -20.540%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) low mild
  1 (1.00%) high mild
  1 (1.00%) high severe

(just for reference) PGO instrumentation compared to Release:

     Running benches/benchmark.rs (/home/zamazan4ik/open_source/grafbase/target/x86_64-unknown-linux-gnu/release/deps/benchmark-107057c07804eda6)
Benchmarking lint schema
Benchmarking lint schema: Warming up for 3.0000 s
Benchmarking lint schema: Collecting 100 samples in estimated 5.3109 s (71k iterations)
Benchmarking lint schema: Analyzing
lint schema             time:   [75.332 µs 75.360 µs 75.389 µs]
                        change: [+98.777% +99.032% +99.272%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

According to the results, PGO measurably improves the tool's performance at least in the benchmark above. However, I think we need to perform the benchmarks on more datasets.

Further steps

I can suggest the following action points:

Perform more PGO benchmarks with other test files. If it shows improvements - add a note to the documentation (README file?) about possible improvements in the tool's performance with PGO.
Optimize prebuilt binaries with PGO. As a training set, you can try to gather multiple real-life files, train PGO on them, and deliver pre-PGO-optimized binaries to the users.

Testing Post-Link Optimization techniques (like LLVM BOLT) would be interesting too (Clang and Rustc already use BOLT as an addition to PGO) but I recommend starting from the usual PGO.

I would be happy to answer your questions about PGO.

The text was updated successfully, but these errors were encountered:

linear · 2024-05-04T19:36:28Z

GB-6657 Evaluate using Profile-Guided Optimization (PGO)

yoav-lavi · 2024-05-09T13:55:57Z

@zamazan4ik Thank you! Looks very interesting.

My understanding is that PGO optimizes based on input in this case, however do we know that we didn't specifically optimize for the benchmark schema only? (as in while there may be some commonalities in input, it'd still be somewhat random in terms of what it can contain)

Do these tools output any sort of indication as to what changes they're making? Since what you're benchmarking is directly used for the optimization it'd be hard to know whether it's generally faster or specifically faster for the benchmark

Thank you!

zamazan4ik · 2024-05-23T10:25:09Z

(excuse me please for the so late response)

My understanding is that PGO optimizes based on input in this case, however do we know that we didn't specifically optimize for the benchmark schema only? (as in while there may be some commonalities in input, it'd still be somewhat random in terms of what it can contain)

I guess that many inputs for the tool share similar internal paths for the tool so it will be safe to prepare some "real life" training dataset for the tool and use it during the preoptimization, and it should work well for real users (deliver PGO-preoptimized linter for users).

Do these tools output any sort of indication as to what changes they're making?

If we are talking about PGO - it's not a dedicated tool, it's a compiler part. In general case, no - the compiler doesn't report changes that are made to your program with PGO - it's an internal thing for the compiler. You should expect different inlining decisions, hot/cold splitting and similar things. If you want to understand more, I can suggest using disassembler to compare non-PGOed vs PGOed assembly for the tool.

Since what you're benchmarking is directly used for the optimization it'd be hard to know whether it's generally faster or specifically faster for the benchmark

Fair point. You can prepare one different dataset for training and another one for evaluating, and then perform the corresponding benchmarks once again. I am 99% sure that for this tool PGO will bring the same performance benefits in such a benchmark scenario too since code paths will be similar.

zamazan4ik added the enhancement New feature or request label May 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate using Profile-Guided Optimization (PGO) #1689

Evaluate using Profile-Guided Optimization (PGO) #1689

zamazan4ik commented May 4, 2024

linear bot commented May 4, 2024

yoav-lavi commented May 9, 2024 •

edited

zamazan4ik commented May 23, 2024

Evaluate using Profile-Guided Optimization (PGO) #1689

Evaluate using Profile-Guided Optimization (PGO) #1689

Comments

zamazan4ik commented May 4, 2024

Test environment

Benchmark

Results

Further steps

linear bot commented May 4, 2024

yoav-lavi commented May 9, 2024 • edited

zamazan4ik commented May 23, 2024

yoav-lavi commented May 9, 2024 •

edited