Parsing performance regression testing #14599

nrmancuso · 2024-03-03T18:30:41Z

As discovered at #14566 (and elsewhere previously), parsing performance is a critical, user-facing metric that we should be consistently measuring and verifying. We have discussed this topic previously, but no action has been taken and we released a "bad" version to users.

A quick hack can be to time the execution of OpenJDK no-error CI tasks; however, this can end up bring pretty inconsistent due to the fact that we download the repo to be parsed during the report generation. Additionally, variables like the amount of resources allocated to the JVM, etc. play a role.

A slightly better approach would be to improve our AST regression testing to show checkstyle execution time as a difference between master and the feature branch (as a percentage), since this will be executed on the same machine, sequentially, and have identical resource allocation.

Probably the best approach would be to create a new script to time only checkstyle execution, and capture other metrics like memory usage, etc, without all the I/O involved with report generation.

For example, our generated report could tell us that parsing was 33% faster on the feature branch, used 10% less memory, etc.

Lmh-java · 2024-03-29T00:22:38Z

@nrmancuso I would like to give it a shot. But, I want to discuss a few things about this to get more insights before everything starts.

I want to follow the best approach. Therefore, I will have to introduce a new test task and create a new script that benchmarks multiple metrics for checkstyle execution. I think it would be better if we also integrate this into the CI (just like how we use the "GitHub, generate report" command).

To benchmark, ideally, it would be beneficial for us to run Checkstyle on some large projects that take relatively longer execution time. Do you have some suggestions on which project should we use as the benchmark sample? Or do we want to do it over many different projects? (In this case, this will be a part of the original regression test)

Another question is, do you prefer to run this benchmark inside the original regression test procedure (change the original code), or run this benchmark in a completely isolated new script? IMO, it would be better to run the benchmark in a completely isolated procedure. For the following reasons:

Easier to measure the metrics accurately.
Easier to develop and maintain
Developers can use these two functions separately

PS: if any of my understanding above is inaccurate, please correct me :)

nrmancuso · 2024-03-30T13:16:20Z

To benchmark, ideally, it would be beneficial for us to run Checkstyle on some large projects that take relatively longer execution time. Do you have some suggestions on which project should we use as the benchmark sample? Or do we want to do it over many different projects? (In this case, this will be a part of the original regression test)

The latest openjdk has proven to be a good codebase to test on for both performance and memory consumption considerations.

Lmh-java · 2024-03-30T16:27:09Z

I will start right now to build a new github action task for measuring both performance and memory consumption on openJDK.

nrmancuso · 2024-03-30T16:41:26Z

I will start right now to build a new github action task for measuring both performance and memory consumption on openJDK.

I would suggest to map out your high level plan here to make sure we are all on the same page.

Also, it would be a good idea to create a POC in your own fork of checkstyle and share here to demonstrate what this solution will look like.

Lmh-java · 2024-03-31T02:33:14Z

Draft Plan

Goal

Implement a new CI test task that benchmarks Checkstyle execution and compares the metrics between the new patch and the original version.

Usage

This benchmark task runs with other CIs when a PR is created or pushed.

Plan

Create a new GitHub action in .workflow, which will perform the following procedure.

Git clone JDK repo as the benchmark target.
Download the baseline file from our repo and parse it.
Git clone Checkstyle from the patch branch, validate, and package.
Run time to measure the execution time for the new patch (This process will be executed for multiple times and find the average).
Compare the benchmark result with the baseline, calculate the % difference. If % difference is larger than a pre-defined threshold, exit with an error code.

PS: now we use time to only measure execution time for POC. Later on, we can change it to gnu time to also measure the RAM.

Lmh-java · 2024-03-31T02:39:44Z

@nrmancuso this is a draft high-level plan, please take a look. I will be working on a sample in my forked repo for the following few days.

romani · 2024-04-01T01:50:08Z

We just need use https://man7.org/linux/man-pages/man1/time.1.html over some command that runs parsing only (config with no modules under Treewalker, or single lightweight module)
In repository we will store baseline text file , with some value of real execution.
PR execution generate new value of real time from time tool
Two values compares, If execution in PR be above a % that we define, we exist with error code.

As soon as this basis prove of concept works well we can extend it to memory and any other parameters. This basis implementation should be able to catch regression if in some PR we revert performance optimization.

Lmh-java · 2024-04-01T02:06:51Z

@romani I have updated my plan (#14599 (comment)). Please take a look. I will work on the POC in my local fork repo now.

romani · 2024-04-01T12:43:42Z

Git clone JDK repo as the benchmark target.
Download the baseline file from our repo and parse it.
Git clone Checkstyle from the patch branch, validate, and package.

this is already done by all CI for you.
base line file should be in our repository in .ci folder.

Issue checkstyle#14599: Build performance regression test

Lmh-java · 2024-04-02T18:54:34Z

@romani Sure, I have created a PR #14754. Please take a look. :)

nrmancuso · 2024-05-02T11:58:42Z

Closed via #14754

nrmancuso · 2024-05-02T11:59:22Z

I see that we did not add memory stats, as I suggested above. Let's use this script in practice for awhile and see if we really need such features :)

Lmh-java · 2024-05-03T03:29:32Z

I see that we did not add memory stats, as I suggested above. Let's use this script in practice for awhile and see if we really need such features :)

@nrmancuso Yes, we decided to go with bare minimum first. If we decide to implement a memory stats in the future, please feel free to let me know. I can do that quickly, since I have that feature in an early version of the PR.

nrmancuso mentioned this issue Mar 3, 2024

Issue #14566: Improve lexer performance #14568

Merged

nrmancuso self-assigned this Mar 4, 2024

romani added approved miscellaneous labels Mar 17, 2024

Lmh-java added a commit to Lmh-java/checkstyle that referenced this issue Apr 1, 2024

Issue checkstyle#14599: Build performance regression test

1b68c0c

Lmh-java added a commit to Lmh-java/checkstyle that referenced this issue Apr 1, 2024

Issue checkstyle#14599: Test performance CI

8ab6877

Lmh-java added a commit to Lmh-java/checkstyle that referenced this issue Apr 1, 2024

Merge pull request #2 from Lmh-java/minghao/performance-regression-test

b2c0fd4

Issue checkstyle#14599: Build performance regression test

Lmh-java added a commit to Lmh-java/checkstyle that referenced this issue Apr 1, 2024

Issue checkstyle#14599: Test performance CI

ff15b1c

Lmh-java added a commit to Lmh-java/checkstyle that referenced this issue Apr 1, 2024

Issue checkstyle#14599: Test performance CI

24ae78e

Lmh-java added a commit to Lmh-java/checkstyle that referenced this issue Apr 1, 2024

Issue checkstyle#14599: Test performance CI

f7042fe

Lmh-java added a commit to Lmh-java/checkstyle that referenced this issue Apr 1, 2024

Issue checkstyle#14599: Test performance CI

2573e7b

Lmh-java added a commit to Lmh-java/checkstyle that referenced this issue Apr 1, 2024

Issue checkstyle#14599: Test performance CI

881876e

Lmh-java added a commit to Lmh-java/checkstyle that referenced this issue Apr 1, 2024

Issue checkstyle#14599: Test performance CI

7b6d6bd

Lmh-java added a commit to Lmh-java/checkstyle that referenced this issue Apr 1, 2024

Issue checkstyle#14599: Test performance CI

530a61b

Lmh-java added a commit to Lmh-java/checkstyle that referenced this issue Apr 1, 2024

Issue checkstyle#14599: Test performance CI

e7d7b8a

Lmh-java added a commit to Lmh-java/checkstyle that referenced this issue Apr 1, 2024

Issue checkstyle#14599: Test performance CI

57b5bfc

Lmh-java added a commit to Lmh-java/checkstyle that referenced this issue Apr 2, 2024

Issue checkstyle#14599: Test performance CI

e4982bc

Lmh-java added a commit to Lmh-java/checkstyle that referenced this issue Apr 2, 2024

Issue checkstyle#14599: Test performance CI

2ddcca3

Lmh-java added a commit to Lmh-java/checkstyle that referenced this issue Apr 2, 2024

Issue checkstyle#14599: Test performance CI

cbe08f0

Lmh-java added a commit to Lmh-java/checkstyle that referenced this issue Apr 2, 2024

Issue checkstyle#14599: Test performance CI

79e8aee

Lmh-java added a commit to Lmh-java/checkstyle that referenced this issue Apr 2, 2024

Issue checkstyle#14599: Test performance CI

5974239

Lmh-java added a commit to Lmh-java/checkstyle that referenced this issue Apr 2, 2024

Issue checkstyle#14599: Create performance regression test CI

c343fb7

Lmh-java mentioned this issue Apr 2, 2024

Issue #14599: Create performance regression test CI #14754

Merged

Lmh-java added a commit to Lmh-java/checkstyle that referenced this issue Apr 2, 2024

Issue checkstyle#14599: Create performance regression test CI

290d9fa

Lmh-java added a commit to Lmh-java/checkstyle that referenced this issue Apr 4, 2024

Issue checkstyle#14599: Create performance regression test CI

2a10b0d

Lmh-java added a commit to Lmh-java/checkstyle that referenced this issue Apr 4, 2024

Issue checkstyle#14599: Create performance regression test CI

d2a87dc

Lmh-java added a commit to Lmh-java/checkstyle that referenced this issue Apr 4, 2024

Issue checkstyle#14599: Create performance regression test CI

8e4fd8b

Lmh-java added a commit to Lmh-java/checkstyle that referenced this issue Apr 4, 2024

Issue checkstyle#14599: Create performance regression test CI

b83b127

Lmh-java added a commit to Lmh-java/checkstyle that referenced this issue Apr 4, 2024

Issue checkstyle#14599: Create performance regression test CI

26ba123

Lmh-java added a commit to Lmh-java/checkstyle that referenced this issue Apr 4, 2024

Issue checkstyle#14599: Create performance regression test CI

7571762

Lmh-java added a commit to Lmh-java/checkstyle that referenced this issue Apr 4, 2024

Issue checkstyle#14599: Create performance regression test CI

ae85131

Lmh-java added a commit to Lmh-java/checkstyle that referenced this issue Apr 4, 2024

Issue checkstyle#14599: Create performance regression test CI

cb8ef90

Lmh-java added a commit to Lmh-java/checkstyle that referenced this issue Apr 5, 2024

Issue checkstyle#14599: Create performance regression test CI

cfbd08a

Lmh-java mentioned this issue Apr 14, 2024

Issue #12542: new TreeWalker property to skip exceptions #14779

Open

Lmh-java added a commit to Lmh-java/checkstyle that referenced this issue Apr 20, 2024

Issue checkstyle#14599: Create performance regression test CI

a2206f7

Lmh-java added a commit to Lmh-java/checkstyle that referenced this issue Apr 21, 2024

Issue checkstyle#14599: Create performance regression test CI

e3060d7

Lmh-java added a commit to Lmh-java/checkstyle that referenced this issue Apr 22, 2024

Issue checkstyle#14599: Create performance regression test CI

c7822f9

Lmh-java added a commit to Lmh-java/checkstyle that referenced this issue Apr 22, 2024

Issue checkstyle#14599: Create performance regression test CI

2f18758

Lmh-java added a commit to Lmh-java/checkstyle that referenced this issue Apr 22, 2024

Issue checkstyle#14599: Create performance regression test CI

e86b762

Lmh-java added a commit to Lmh-java/checkstyle that referenced this issue Apr 22, 2024

Issue checkstyle#14599: Create performance regression test CI

88b2dae

Lmh-java added a commit to Lmh-java/checkstyle that referenced this issue Apr 28, 2024

Issue checkstyle#14599: Create performance regression test CI

05d4265

Lmh-java added a commit to Lmh-java/checkstyle that referenced this issue May 1, 2024

Issue checkstyle#14599: Create performance regression test CI

b3e43f1

Lmh-java added a commit to Lmh-java/checkstyle that referenced this issue May 1, 2024

Issue checkstyle#14599: Create performance regression test CI

183bafa

Lmh-java added a commit to Lmh-java/checkstyle that referenced this issue May 1, 2024

Issue checkstyle#14599: Create performance regression test CI

a4cc442

nrmancuso pushed a commit that referenced this issue May 2, 2024

Issue #14599: Create performance regression test CI

d99bd0d

github-actions bot added this to the 10.16.1 milestone May 2, 2024

nrmancuso closed this as completed May 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parsing performance regression testing #14599

Parsing performance regression testing #14599

nrmancuso commented Mar 3, 2024 •

edited

Lmh-java commented Mar 29, 2024 •

edited

nrmancuso commented Mar 30, 2024

Lmh-java commented Mar 30, 2024

nrmancuso commented Mar 30, 2024 •

edited

Lmh-java commented Mar 31, 2024 •

edited

Lmh-java commented Mar 31, 2024

romani commented Apr 1, 2024

Lmh-java commented Apr 1, 2024

romani commented Apr 1, 2024

Lmh-java commented Apr 2, 2024

nrmancuso commented May 2, 2024

nrmancuso commented May 2, 2024

Lmh-java commented May 3, 2024

Parsing performance regression testing #14599

Parsing performance regression testing #14599

Comments

nrmancuso commented Mar 3, 2024 • edited

Lmh-java commented Mar 29, 2024 • edited

nrmancuso commented Mar 30, 2024

Lmh-java commented Mar 30, 2024

nrmancuso commented Mar 30, 2024 • edited

Lmh-java commented Mar 31, 2024 • edited

Draft Plan

Goal

Usage

Plan

Lmh-java commented Mar 31, 2024

romani commented Apr 1, 2024

Lmh-java commented Apr 1, 2024

romani commented Apr 1, 2024

Lmh-java commented Apr 2, 2024

nrmancuso commented May 2, 2024

nrmancuso commented May 2, 2024

Lmh-java commented May 3, 2024

nrmancuso commented Mar 3, 2024 •

edited

Lmh-java commented Mar 29, 2024 •

edited

nrmancuso commented Mar 30, 2024 •

edited

Lmh-java commented Mar 31, 2024 •

edited