[tests] Make the 'test_html_copy_source' case agnostic of test evaluation order. #12118

jayaddison · 2024-03-17T18:16:53Z

Feature or Bugfix

Bugfix

Purpose

In an attempt to help with issues with parallelizing tests documented in [BUG] Tests have global side-effects. #11285, I've begun running tests locally in random order to attempt to find and resolve problems.

Detail

Use a unique srcdir during the test_html_copy_source test case so that it can succeed even if other tests cases that use the same testroot modify their source files before it runs.

Relates

Contributes towards [BUG] Tests have global side-effects. #11285.

…tion order

jayaddison · 2024-03-17T18:19:27Z

(this is ready for review: I intend to open one pull request per independency-fixup. my guess is that a small number of the tests will be extremely difficult to resolve, compared to this one)

picnixz · 2024-03-18T08:22:50Z

Thank you for this one, but in the future we'll just use pytest.mark.isolate() instead. IMO, using an explicit srcdir is not recommended because you must be sure that it's indeed unique across the test suite. But this means that, as a contributor, you must be well-versed in the rest of the test suite (at least in the rest of the module).

Technically speaking, the minimal isolation will guarantee you that you are in different folders since it's based on 1) the test module / class and the sphinx markers arguments.

As such, I'd prefer waiting for part 3 to be merged until we start fixing the current test suite.

jayaddison · 2024-03-18T09:08:40Z

Thank you for this one, but in the future we'll just use pytest.mark.isolate() instead. IMO, using an explicit srcdir is not recommended because you must be sure that it's indeed unique across the test suite. But this means that, as a contributor, you must be well-versed in the rest of the test suite (at least in the rest of the module).

Technically speaking, the minimal isolation will guarantee you that you are in different folders since it's based on 1) the test module / class and the sphinx markers arguments.

As such, I'd prefer waiting for part 3 to be merged until we start fixing the current test suite.

Fair points. When I run the test suite locally with pytest-randomly -- and this is admittedly with quite a few tests skipped due to missing deps -- I am down to fewer than 10 tests that fail (in fact, more like 6 after merging these fixes).

That makes me wonder whether we should try fixing the current test-suite first, enable random runtime, and then add parallelism.

I don't want to block, but I'd also prefer to navigate towards simplicity and robustness first before adding (I admit perhaps necessary) complexity.

picnixz · 2024-03-18T09:31:14Z

Ok, I can give you the tests that are known to fail and the reasons why:

In test_ext_autodoc_autoclass.py: all tests missing the @pytest.mark.sphinx('html', testroot='ext-autodoc') (if executed individually they will fail). This can be done in a separate PR.
C/C++ tests with warnings: some tests have inter-depedencies due to how they executed. For instance tests/test_domains/test_domain_cpp.py::test_domain_cpp_build_semicolon fails because there is one warning not being counted (and that specific warning does not seem to appear if this test is done after another one, but I don't know which one). Actually, the filter_warnings function is incompatible when you use a custom srcdir so it needs to be updated.
tests/test_builders/test_build_latex.py::test_latex_table_custom_template_caseC: the group of test for testing latex tables have side-effects and depending on the order, the output is messed up.
tests/test_extensions/test_ext_autosummary.py::test_get_items_summary -- likely due to imports (use --random-order-seed=610755 with pytest-random-order plugin) to see this. This one can be fixed using a custom unique srcdir.
tests/test_extensions/test_ext_autosummary.py -- some other tests need to unload their sys modules.
every test that modifies its sources is prone to side-effects

jayaddison · 2024-03-18T09:42:51Z

Could you review these ideas for fixing each category:

For tests that modify their sources: explicit (and OK to be manual, for now) srcdir.
The autodoc and autosummary tests seem hard to remove side-effects from, because the extensions have side-effects. I don't know what to suggest for those.
For the others: fix individually and close.

...and then enable a test suite with pytest-randomly in CI to catch regressions.

If I'm creating unnecessary work, then let's not do this. But it feels achievable to me (apart from the autodoc/autosummary.. skip?).

jayaddison · 2024-03-18T09:43:37Z

Ah, I should add: linkcheck is a problem too. Random ordering doesn't work well with the test HTTP server threads (I'm working on a possible fix -- isolating the ports as you suggested).

picnixz · 2024-03-18T09:50:37Z

For tests that modify their sources: explicit (and OK to be manual, for now) srcdir.

Not always because there are some helper functions that filter message warnings based on the testroot name... which is not used if you use a custom srcdir (so you either need a custom srcdir with the testroot name inside or fix the filter function...)

The autodoc and autosummary tests seem hard to remove side-effects from, because the extensions have side-effects. I don't know what to suggest for those.

That's easy to patch. Just use a fixture that unloads modules at the end of each test (make the rollback_sysmodules an autouse fixture, but we can improve that one (leave it to me)).

For the others: fix individually and close.

That's the main issue: determining the "others" is a challenge. Even with random-order, you can have no failures. But random-order + xdist might uncover some failures. It's really hard to know now what the bad tests are...

Maybe we should review the whole test suite and extract the bad tests we can think of...?

jayaddison · 2024-03-18T09:57:08Z

The autodoc and autosummary tests seem hard to remove side-effects from, because the extensions have side-effects. I don't know what to suggest for those.

That's easy to patch. Just use a fixture that unloads modules at the end of each test (make the rollback_sysmodules an autouse fixture, but we can improve that one (leave it to me)).

Really? Ok! Thank you.

For the others: fix individually and close.

That's the main issue: determining the "others" is a challenge. Even with random-order, you can have no failures. But random-order + xdist might uncover some failures. It's really hard to know now what the bad tests are...

Maybe we should review the whole test suite and extract the bad tests we can think of...?

My suggestion would be: make this week a week of pytest-randomly in continuous integration, and smoke out what we can. If the situation isn't near-complete by next weekend, then we revert to Plan 1 (even if I am reluctant about it, as I might be; a warning to my future self).

picnixz · 2024-03-18T10:17:40Z

My suggestion would be: make this week a week of pytest-randomly in continuous integration, and smoke out what we can

Is there a script that can spot all errors? like, running many instances of the test suite would likely be very long and would monopolize the CI/CD resources so...

jayaddison · 2024-03-18T10:27:47Z

My suggestion would be: make this week a week of pytest-randomly in continuous integration, and smoke out what we can

Is there a script that can spot all errors? like, running many instances of the test suite would likely be very long and would monopolize the CI/CD resources so...

After a test begins failing due to ordering concerns, there is a way to backtrack to get a reasonable degree of confidence about where the interacting tests might exist.

For example: for multiple --randomly-seed values: run the affected test individually; if it still passes, then run tests in that module only randomly; if it still passes, run tests in that directory randomly, ...

jayaddison · 2024-03-18T10:33:07Z

(whether that exists as a reusable pytest plugin / functionality.. not sure)

jayaddison · 2024-03-19T19:23:04Z

My suggestion would be: make this week a week of pytest-randomly in continuous integration, and smoke out what we can

Is there a script that can spot all errors? like, running many instances of the test suite would likely be very long and would monopolize the CI/CD resources so...

After a test begins failing due to ordering concerns, there is a way to backtrack to get a reasonable degree of confidence about where the interacting tests might exist.

For example: for multiple --randomly-seed values: run the affected test individually; if it still passes, then run tests in that module only randomly; if it still passes, run tests in that directory randomly, ...

The closest thing I've found so far is: https://github.com/pytest-dev/pytest-random-order (less-well-known than pytest-randomly, it seems).

It can shuffle tests at the class/module/package/global level.

However it doesn't do one thing that would be really helpful for us: systematically narrowing-down-to problem dependencies between tests (B only passes when A runs first), or interferences between tests (B begins failing when A runs first).

If all tests succeed globally in order, then if a single test (let's say the 5th test out of total 10) within a shuffled class fails when using random order, then I think that implies that the interfering test must have been one of the previous four tests. So a module could reshuffle those 5, using something like bisection to efficiently determine where the interference relationship begins.

picnixz · 2024-03-20T07:54:06Z

I'm using the random-order because it's compatible with xdist.

[tests] Make the 'test_html_copy_source' case agnostic of test evalua…

3f76553

…tion order

jayaddison added type:bug type:tests labels Mar 17, 2024

picnixz mentioned this pull request Mar 18, 2024

[tests] Implementation of the new testing plugin (backward-compatible) [part 2] #12093

Open

jayaddison mentioned this pull request Mar 23, 2024

[tracker] [tests] Python test suite reliability: striving for non-flaky, parallelizable, random-orderable tests. #12191

Open

11 tasks

Merge branch 'master' into issue-11285/test-order-agnostic

8c02df7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[tests] Make the 'test_html_copy_source' case agnostic of test evaluation order. #12118

[tests] Make the 'test_html_copy_source' case agnostic of test evaluation order. #12118

jayaddison commented Mar 17, 2024 •

edited

jayaddison commented Mar 17, 2024

picnixz commented Mar 18, 2024

jayaddison commented Mar 18, 2024

picnixz commented Mar 18, 2024 •

edited

jayaddison commented Mar 18, 2024

jayaddison commented Mar 18, 2024

picnixz commented Mar 18, 2024

jayaddison commented Mar 18, 2024 •

edited

picnixz commented Mar 18, 2024

jayaddison commented Mar 18, 2024

jayaddison commented Mar 18, 2024

jayaddison commented Mar 19, 2024

picnixz commented Mar 20, 2024

[tests] Make the 'test_html_copy_source' case agnostic of test evaluation order. #12118

Are you sure you want to change the base?

[tests] Make the 'test_html_copy_source' case agnostic of test evaluation order. #12118

Conversation

jayaddison commented Mar 17, 2024 • edited

Feature or Bugfix

Purpose

Detail

Relates

jayaddison commented Mar 17, 2024

picnixz commented Mar 18, 2024

jayaddison commented Mar 18, 2024

picnixz commented Mar 18, 2024 • edited

jayaddison commented Mar 18, 2024

jayaddison commented Mar 18, 2024

picnixz commented Mar 18, 2024

jayaddison commented Mar 18, 2024 • edited

picnixz commented Mar 18, 2024

jayaddison commented Mar 18, 2024

jayaddison commented Mar 18, 2024

jayaddison commented Mar 19, 2024

picnixz commented Mar 20, 2024

jayaddison commented Mar 17, 2024 •

edited

picnixz commented Mar 18, 2024 •

edited

jayaddison commented Mar 18, 2024 •

edited