Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SUREFIRE-1967] Fix parallel test ordering to prevent high resource consumption #407

Conversation

findepi
Copy link
Contributor

@findepi findepi commented Dec 14, 2021

[ https://issues.apache.org/jira/browse/SUREFIRE-1967 ]

Before the change, TestNG run from Surefire can execute @BeforeClass
on many different test classes/instances, without invoking @AfterClass
yet, leading to high resource utilization. This is not observed when
tests are invoked via a suite file. This is because XmlClass.m_index
field is used by TestNG to order test execution and this field used not
to be set by Surefire. This commit lets Surefire initialize XmlClass
object in a similar manner to how TestNG suite file XML parser does.

Following this checklist to help us incorporate your
contribution quickly and easily:

  • Make sure there is a JIRA issue filed
    for the change (usually before you start working on it). Trivial changes like typos do not
    require a JIRA issue. Your pull request should address just this issue, without
    pulling in other changes.
  • Each commit in the pull request should have a meaningful subject line and body.
  • Format the pull request title like [SUREFIRE-XXX] - Fixes bug in ApproximateQuantiles,
    where you replace SUREFIRE-XXX with the appropriate JIRA issue. Best practice
    is to use the JIRA issue title in the pull request title and in the first line of the
    commit message.
  • Write a pull request description that is detailed enough to understand what the pull request does, how, and why.
  • Run mvn clean install to make sure basic checks pass. A more thorough check will
    be performed on your pull request automatically.
  • You have run the integration tests successfully (mvn -Prun-its clean install).

If your pull request is about ~20 lines of code you don't need to sign an
Individual Contributor License Agreement if you are unsure
please ask on the developers list.

To make clear that you license your contribution under
the Apache License Version 2.0, January 2004
you have to acknowledge this by using the following check-box.

@findepi
Copy link
Contributor Author

findepi commented Dec 14, 2021

cc @losipiuk @findinpath

@findepi findepi force-pushed the findepi/fix-parallel-test-ordering-to-prevent-high-resource-consumption-e8385a branch from 220af5f to e60a302 Compare December 14, 2021 17:15
{
throw new RuntimeException( e );
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we have here reflective access for the m_index field as well?

Instead of setting the m_index via reflection we could use the appropriate constructor to set it:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code is compiled against TestNG 5.10 and that version doesn't have the index field in XmlClass, hence use of reflection.

Instead of setting the m_index via reflection we could use the appropriate constructor to set it:

Yes, I can lookup a constructor taking (String.class, int.class) arguments. It's not as robust or future-proof as finding a setIndex method though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls add a line with a comment regarding TestNG 5.10. It was very good point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@findepi It would be an ideal situation to ensure same behavior in both versions. So in that case the constructor would help us XmlClass(String className, int index, boolean loadClasses). Am I right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two constructors handle the same information across versions
XmlClass(String className, int index, boolean loadClasses)
XmlClass(String name, Boolean declaredClass, int index)
@juherr Can you ensure that the constructors, at least two, would not be removed in the future? Difficult question for IT development, I know. :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Tibor17 it's possible to select and invoke one of the constructors, but i don't think the added complexity is worth the effort (i am mostly concerned about future code readability and maintainability). Note that after all, we're solving a problem that exists in TestNG >= 6.3, and TestNG 5.x is not going to change. The m_index field is used in TestNG 5.x when preserve-order is set, but it's not possible to set it when invoking tests via surefire. More precisely, preserve-order seems to require a suite file, but then TestNG will populate the m_index, not us.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls add a line with a comment regarding TestNG 5.10. It was very good point.

You mean this one:

The code is compiled against TestNG 5.10 and that version doesn't have the index field in XmlClass, hence use of reflection. ?

I wouldn't want to put exact TestNG 5.10 version in the source, as it's subject to change, but i added explanatory comment above XML_CLASS_SET_INDEX constant:

// Using reflection because XmlClass.setIndex is available since TestNG 6.3
// XmlClass.m_index field is available since TestNG 5.13, but prior to 6.3 required invoking constructor
// and constructor XmlClass constructor signatures evolved over time.

i tried to make it convey same meaning: that we have to use the reflection.

@findinpath
Copy link

Please do take a look at

https://github.com/cbeust/testng/blob/master/testng-core/src/main/java/org/testng/internal/DynamicGraphHelper.java#L82-L89

It seems that the orders of the classes is not preserved when running the tests in parallel.

@findepi
Copy link
Contributor Author

findepi commented Dec 14, 2021

It seems that the orders of the classes is not preserved when running the tests in parallel.

When you run without parallel, i didn't observe high resource consumption at all -- in my simple test project the tests are being setup and torn down in turns, with no two tests being initialized at the same time

I tested the surefire+testng behavior before and after the change, with parallel="methods" and thread count 2.
I used TestNG versions 6.10 (as currently used in Trino), and 7.3.0 (the newest that worked with surefire 3.0.0-M5, surefire 3.0.0-M6 doesn't work with my local maven version). For both testng versions, before the change i would get high number of test classes initialized concurrently, and not torn down yet.

After the change, i wouldn't see more than 2 (test thread count) test classes initialized at the same time.

@findepi
Copy link
Contributor Author

findepi commented Dec 14, 2021

@Tibor17 This is my first contribution here. i've signed & sent the CLA. Would you mind approving the CI to run?

@slawekjaranowski
Copy link
Member

Does this change has no influences on existing tests?

Can you prepare some test which can confirm this behavior?
Start from unit test.

@findepi
Copy link
Contributor Author

findepi commented Dec 15, 2021

@slawekjaranowski thanks for your feedback

Does this change has no influences on existing tests?

I've completed run of mvn -Prun-its clean install

$ mvn -Prun-its clean install
[....]
[INFO] Reactor Summary for Apache Maven Surefire 3.0.0-M6-SNAPSHOT:
[INFO]
[INFO] Apache Maven Surefire .............................. SUCCESS [  3.785 s]
[INFO] SureFire Logger API ................................ SUCCESS [  3.929 s]
[INFO] SureFire API ....................................... SUCCESS [ 18.091 s]
[INFO] Surefire Extensions API ............................ SUCCESS [  3.599 s]
[INFO] Surefire Extensions SPI ............................ SUCCESS [  1.129 s]
[INFO] SureFire Booter .................................... SUCCESS [ 12.088 s]
[INFO] Maven Surefire Test-Grouping Support ............... SUCCESS [  3.121 s]
[INFO] SureFire Providers ................................. SUCCESS [  0.399 s]
[INFO] Shared JUnit3 Provider Code ........................ SUCCESS [  2.562 s]
[INFO] Shared Java 5 Provider Base ........................ SUCCESS [  2.905 s]
[INFO] Shared JUnit4 Provider Code ........................ SUCCESS [  3.168 s]
[INFO] Shared JUnit48 Provider Code ....................... SUCCESS [  5.048 s]
[INFO] SureFire JUnit Runner .............................. SUCCESS [  2.701 s]
[INFO] SureFire JUnit4 Runner ............................. SUCCESS [  2.712 s]
[INFO] Maven Surefire Common .............................. SUCCESS [ 51.055 s]
[INFO] SureFire JUnitCore Runner .......................... SUCCESS [02:17 min]
[INFO] SureFire JUnit Platform Runner ..................... SUCCESS [  5.813 s]
[INFO] SureFire TestNG Utils .............................. SUCCESS [  2.796 s]
[INFO] SureFire TestNG Runner ............................. SUCCESS [  3.960 s]
[INFO] ShadeFire JUnit3 Provider .......................... SUCCESS [  1.236 s]
[INFO] Surefire Report Parser ............................. SUCCESS [  3.479 s]
[INFO] Maven Surefire Plugin .............................. SUCCESS [  5.551 s]
[INFO] Maven Failsafe Plugin .............................. SUCCESS [ 26.472 s]
[INFO] Maven Surefire Report Plugin ....................... SUCCESS [  8.341 s]
[INFO] Maven Surefire Integration Tests ................... SUCCESS [  01:06 h]
[INFO] Surefire Shared Utils .............................. SUCCESS [  1.649 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  01:11 h
[INFO] Finished at: 2021-12-15T10:20:56+01:00
[INFO] ------------------------------------------------------------------------

What else i should do?

Can you prepare some test which can confirm this behavior?
Start from unit test.

I am not sure how i can test this in a unit manner.

I found tests in https://github.com/apache/maven-surefire/tree/master/surefire-its/src/test/resources and I think i would be able to add a test like that. I already have a local project I am testing with, and would need to clean it up.

@slawekjaranowski please advise.

@slawekjaranowski
Copy link
Member

Look into org.apache.maven.surefire.its.CheckTestNgGroupThreadParallelIT
and try prepare similar with your case, the most difficult will be prepare assertions ....

@findepi
Copy link
Contributor Author

findepi commented Dec 15, 2021

@slawekjaranowski thanks, this looks awesome. Can you please share the fastest way to run single test either from IDE or command line?

i tried this

mvn -Prun-its clean install -Dtest=CheckTestNgGroupThreadParallelIT -pl :surefire-its

and got

[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running org.apache.maven.surefire.its.CheckTestNgGroupThreadParallelIT
[INFO] [stdout] Using JAVA_HOME=/Library/Java/JavaVirtualMachines/zulu-11.jdk/Contents/Home in forked launcher.
...
[ERROR] Plugin org.apache.maven.plugins:maven-surefire-plugin:null or one of its dependencies could not be resolved: org.apache.maven.plugins:maven-surefire-plugin:jar:null ...

should i be setting surefire.version by hand?

@slawekjaranowski
Copy link
Member

try

 mvn verify -pl surefire-its -P run-its -Dit.test=CheckTestNgGroupThreadParallelIT

magic it.test instead of test

@findepi findepi force-pushed the findepi/fix-parallel-test-ordering-to-prevent-high-resource-consumption-e8385a branch from e60a302 to 0725ea4 Compare December 15, 2021 19:55
@findepi
Copy link
Contributor Author

findepi commented Dec 15, 2021

@slawekjaranowski thanks, added a test.
At least locally, it did fail before the PR's change and succeeded afterwards.

@findepi findepi force-pushed the findepi/fix-parallel-test-ordering-to-prevent-high-resource-consumption-e8385a branch from 0725ea4 to 5d55203 Compare December 15, 2021 20:56
@findepi
Copy link
Contributor Author

findepi commented Dec 15, 2021

@slawekjaranowski thanks for review, applied comments

@findepi findepi force-pushed the findepi/fix-parallel-test-ordering-to-prevent-high-resource-consumption-e8385a branch from 5d55203 to 3d44630 Compare December 16, 2021 08:44
@findepi
Copy link
Contributor Author

findepi commented Dec 16, 2021

@slawekjaranowski thanks for your review, comments applied.
i also added a test run with some older testng version. i determined it's useful, because the solution relies on reflection, so could silently stop working.

@findepi
Copy link
Contributor Author

findepi commented Dec 20, 2021

@slawekjaranowski i've received a confirmation that my signed CLA was received by Apache.
Please let me know what else I should change here.

@slawekjaranowski
Copy link
Member

@findepi - please follow dev mailing list.

@findepi
Copy link
Contributor Author

findepi commented Dec 21, 2021

@slawekjaranowski do you want me to sign up to Maven Developer List (dev-subscribe@maven.apache.org)?

@slawekjaranowski
Copy link
Member

simply send email for given address and follow on response.

@findepi
Copy link
Contributor Author

findepi commented Dec 22, 2021

@slawekjaranowski thanks, done

@findepi
Copy link
Contributor Author

findepi commented Dec 23, 2021

@slawekjaranowski should i do any other changes here? is there a chance to get this into 3.0.0-M6?
As we add more tests, we face dilemma whether to increase tests' heap size or reduce parallelism. While GitHub Actions provide something like 7 GB memory, we are heavy testcontainers users (Trino plugins require integration tests), so we need to put aside quite some memory for docker containers. Thus we lean towards removing test parallelism, but this will increase build times. (just sharing for context)

@Tibor17
Copy link
Contributor

Tibor17 commented Dec 24, 2021

@findepi
Pls do not use return XmlClass.class.getMethod( "setIndex", int.class );. We have class something like SurefireReflector and there is setter reflector. It is a utility widely used in Surefire project and it makes the reflection calls very compact without catching exceptions, etc. Thx

@Tibor17
Copy link
Contributor

Tibor17 commented Dec 24, 2021

@findepi
Your commit in this PR is small one and we do not have to require CLA. If you want to provide some invention property to ASF Maven project like a big feature from another project as I did in 2013 then yes I had to sign CLA. But this is a small fix which is not used in another projects for sure because it is not applicable to another projects and where you use Java and Maven API.

@findepi findepi force-pushed the findepi/fix-parallel-test-ordering-to-prevent-high-resource-consumption-e8385a branch from 3d44630 to bb78b57 Compare December 24, 2021 11:54
@findepi
Copy link
Contributor Author

findepi commented Dec 24, 2021

@Tibor17 thanks for your review.
I didn't notice the ReflectionUtils class even though it's already used in TestNGExecutor. This indeed simplifies the code flow, as it handles reflection-related exceptions. Please take another look.

@findepi
Copy link
Contributor Author

findepi commented Jan 8, 2022

Rebased to rerun the CI. @slawekjaranowski can you please approve the CI run?

@findepi findepi force-pushed the findepi/fix-parallel-test-ordering-to-prevent-high-resource-consumption-e8385a branch 2 times, most recently from 2e03cb9 to b9fe306 Compare January 10, 2022 10:03
@findepi
Copy link
Contributor Author

findepi commented Jan 10, 2022

Apologies, I've accidentally removed the test exclusion with TestNG 5 on JDK 17, so inevitably the build was failing. Brought it back now. The build is green in my fork https://github.com/findepi/maven-surefire/actions/runs/1676804939

@slawekjaranowski can you please approve the build run here?

@findepi
Copy link
Contributor Author

findepi commented Jan 11, 2022

thanks for approving the PR run.
I see that JUnit47ParallelIT.forcedShutdownVerifyingLogs failed in macOS-latest jdk-8-temurin.
I hope that's not related, but I am not sure what I should do. @slawekjaranowski @Tibor17 please advise.

@Tibor17
Copy link
Contributor

Tibor17 commented Jan 12, 2022

JUnit47ParallelIT.forcedShutdownVerifyingLogs is not related to this issue since the IT is not related to TestNG.

@Tibor17
Copy link
Contributor

Tibor17 commented Jan 12, 2022

Matcher matcher = Pattern.compile( "^(\\d+)\\..*$" ).matcher( System.getProperty( "java.version" ) );
         assertTrue( matcher.matches() );
int javaVersionFirstNumber = Integer.parseInt( matcher.group( 1 ) );
         assumeTrue( "TestNG 5.13 does not work with Java 17", javaVersionFirstNumber < 17 );

can be replaced with

import static org.apache.maven.surefire.its.fixture.HelperAssertions.assumeJavaMaxVersion;
assumeJavaMaxVersion( 16 );

@findepi findepi force-pushed the findepi/fix-parallel-test-ordering-to-prevent-high-resource-consumption-e8385a branch from b9fe306 to 3a0fdf4 Compare January 12, 2022 16:03
@findepi
Copy link
Contributor Author

findepi commented Jan 12, 2022

can be replaced with

import static org.apache.maven.surefire.its.fixture.HelperAssertions.assumeJavaMaxVersion;

nice, thanks @Tibor17, updated.

@Tibor17
Copy link
Contributor

Tibor17 commented Jan 12, 2022

One more thing. The Base class has try-catch. Please use throws Exception , it is typical for tests to process the exception by itself in the report, not necessary to rethrow exceptions.

@findepi findepi force-pushed the findepi/fix-parallel-test-ordering-to-prevent-high-resource-consumption-e8385a branch from 3a0fdf4 to 9582fb7 Compare January 12, 2022 16:15
{
throw new RuntimeException( e );
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls add a line with a comment regarding TestNG 5.10. It was very good point.

{
throw new RuntimeException( e );
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@findepi It would be an ideal situation to ensure same behavior in both versions. So in that case the constructor would help us XmlClass(String className, int index, boolean loadClasses). Am I right?

{
throw new RuntimeException( e );
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two constructors handle the same information across versions
XmlClass(String className, int index, boolean loadClasses)
XmlClass(String name, Boolean declaredClass, int index)
@juherr Can you ensure that the constructors, at least two, would not be removed in the future? Difficult question for IT development, I know. :-)

<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>${surefire.version}</version>
<configuration>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the CPU is overloaded in Jenkins, this hypothesis may be broken and parallel threads turn to serial. Don;t rely on parallel execution even if you run parallel tests. Let's see the Jenkins test result especially on Windows...

@Tibor17
Copy link
Contributor

Tibor17 commented Jan 12, 2022

@findepi
One more thing. Is this new method tryGetConstructor called in some unit test?
It would be great to have a coverage. I know your implementation is okay but it is useful for our test coverage graph to keep the coverage on the same level or let it grow.

…onsumption

Before the change, TestNG run from Surefire can execute `@BeforeClass`
on many different test classes/instances, without invoking `@AfterClass`
yet, leading to high resource utilization. This is not observed when
tests are invoked via a suite file. This is because `XmlClass.m_index`
field is used by TestNG to order test execution and this field used not
to be set by Surefire. This commit lets Surefire initialize `XmlClass`
object in a similar manner to how TestNG suite file XML parser does.
@findepi findepi force-pushed the findepi/fix-parallel-test-ordering-to-prevent-high-resource-consumption-e8385a branch from 9582fb7 to f291fce Compare January 12, 2022 20:46
@slawekjaranowski
Copy link
Member

@Tibor17
Copy link
Contributor

Tibor17 commented Jan 13, 2022

Currently I have forced rebuild of the Jenkins build because yesterday our INFRA team restarted the Jenkins and the previous job did not finish.

@slawekjaranowski
Copy link
Member

slawekjaranowski commented Jan 13, 2022

There new commit appear - so I recreated branch: https://github.com/apache/maven-surefire/tree/SUREFIRE-1967
build start once again
last commit: f291fce

@findepi
Copy link
Contributor Author

findepi commented Jan 14, 2022

@slawekjaranowski thanks for triggering the Jenkins build.
I see the GHA succeeded on https://github.com/apache/maven-surefire/tree/SUREFIRE-1967 branch (https://github.com/apache/maven-surefire/actions/runs/1693777765). How was the Jenkins build?

@Tibor17 Tibor17 merged commit 909637c into apache:master Jan 14, 2022
@Tibor17
Copy link
Contributor

Tibor17 commented Jan 14, 2022

@findepi
Thx for contributing!

@findepi findepi deleted the findepi/fix-parallel-test-ordering-to-prevent-high-resource-consumption-e8385a branch January 14, 2022 14:06
@findepi
Copy link
Contributor Author

findepi commented Jan 14, 2022

Thank you @Tibor17 for the merge, and thanks @Tibor17 @slawekjaranowski for all the review feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants