Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tests] Make the 'test_html_copy_source' case agnostic of test evaluation order. #12118

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

jayaddison
Copy link
Contributor

@jayaddison jayaddison commented Mar 17, 2024

Feature or Bugfix

  • Bugfix

Purpose

Detail

  • Use a unique srcdir during the test_html_copy_source test case so that it can succeed even if other tests cases that use the same testroot modify their source files before it runs.

Relates

@jayaddison
Copy link
Contributor Author

(this is ready for review: I intend to open one pull request per independency-fixup. my guess is that a small number of the tests will be extremely difficult to resolve, compared to this one)

@picnixz
Copy link
Member

picnixz commented Mar 18, 2024

Thank you for this one, but in the future we'll just use pytest.mark.isolate() instead. IMO, using an explicit srcdir is not recommended because you must be sure that it's indeed unique across the test suite. But this means that, as a contributor, you must be well-versed in the rest of the test suite (at least in the rest of the module).

Technically speaking, the minimal isolation will guarantee you that you are in different folders since it's based on 1) the test module / class and the sphinx markers arguments.

As such, I'd prefer waiting for part 3 to be merged until we start fixing the current test suite.

@jayaddison
Copy link
Contributor Author

Thank you for this one, but in the future we'll just use pytest.mark.isolate() instead. IMO, using an explicit srcdir is not recommended because you must be sure that it's indeed unique across the test suite. But this means that, as a contributor, you must be well-versed in the rest of the test suite (at least in the rest of the module).

Technically speaking, the minimal isolation will guarantee you that you are in different folders since it's based on 1) the test module / class and the sphinx markers arguments.

As such, I'd prefer waiting for part 3 to be merged until we start fixing the current test suite.

Fair points. When I run the test suite locally with pytest-randomly -- and this is admittedly with quite a few tests skipped due to missing deps -- I am down to fewer than 10 tests that fail (in fact, more like 6 after merging these fixes).

That makes me wonder whether we should try fixing the current test-suite first, enable random runtime, and then add parallelism.

I don't want to block, but I'd also prefer to navigate towards simplicity and robustness first before adding (I admit perhaps necessary) complexity.

@picnixz
Copy link
Member

picnixz commented Mar 18, 2024

Ok, I can give you the tests that are known to fail and the reasons why:

  • In test_ext_autodoc_autoclass.py: all tests missing the @pytest.mark.sphinx('html', testroot='ext-autodoc') (if executed individually they will fail). This can be done in a separate PR.
  • C/C++ tests with warnings: some tests have inter-depedencies due to how they executed. For instance tests/test_domains/test_domain_cpp.py::test_domain_cpp_build_semicolon fails because there is one warning not being counted (and that specific warning does not seem to appear if this test is done after another one, but I don't know which one). Actually, the filter_warnings function is incompatible when you use a custom srcdir so it needs to be updated.
  • tests/test_builders/test_build_latex.py::test_latex_table_custom_template_caseC: the group of test for testing latex tables have side-effects and depending on the order, the output is messed up.
  • tests/test_extensions/test_ext_autosummary.py::test_get_items_summary -- likely due to imports (use --random-order-seed=610755 with pytest-random-order plugin) to see this. This one can be fixed using a custom unique srcdir.
  • tests/test_extensions/test_ext_autosummary.py -- some other tests need to unload their sys modules.
  • every test that modifies its sources is prone to side-effects

@jayaddison
Copy link
Contributor Author

Could you review these ideas for fixing each category:

  • For tests that modify their sources: explicit (and OK to be manual, for now) srcdir.
  • The autodoc and autosummary tests seem hard to remove side-effects from, because the extensions have side-effects. I don't know what to suggest for those.
  • For the others: fix individually and close.

...and then enable a test suite with pytest-randomly in CI to catch regressions.

If I'm creating unnecessary work, then let's not do this. But it feels achievable to me (apart from the autodoc/autosummary.. skip?).

@jayaddison
Copy link
Contributor Author

Ah, I should add: linkcheck is a problem too. Random ordering doesn't work well with the test HTTP server threads (I'm working on a possible fix -- isolating the ports as you suggested).

@picnixz
Copy link
Member

picnixz commented Mar 18, 2024

For tests that modify their sources: explicit (and OK to be manual, for now) srcdir.

Not always because there are some helper functions that filter message warnings based on the testroot name... which is not used if you use a custom srcdir (so you either need a custom srcdir with the testroot name inside or fix the filter function...)

The autodoc and autosummary tests seem hard to remove side-effects from, because the extensions have side-effects. I don't know what to suggest for those.

That's easy to patch. Just use a fixture that unloads modules at the end of each test (make the rollback_sysmodules an autouse fixture, but we can improve that one (leave it to me)).

For the others: fix individually and close.

That's the main issue: determining the "others" is a challenge. Even with random-order, you can have no failures. But random-order + xdist might uncover some failures. It's really hard to know now what the bad tests are...

Maybe we should review the whole test suite and extract the bad tests we can think of...?

@jayaddison
Copy link
Contributor Author

jayaddison commented Mar 18, 2024

The autodoc and autosummary tests seem hard to remove side-effects from, because the extensions have side-effects. I don't know what to suggest for those.

That's easy to patch. Just use a fixture that unloads modules at the end of each test (make the rollback_sysmodules an autouse fixture, but we can improve that one (leave it to me)).

Really? Ok! Thank you.

For the others: fix individually and close.

That's the main issue: determining the "others" is a challenge. Even with random-order, you can have no failures. But random-order + xdist might uncover some failures. It's really hard to know now what the bad tests are...

Maybe we should review the whole test suite and extract the bad tests we can think of...?

My suggestion would be: make this week a week of pytest-randomly in continuous integration, and smoke out what we can. If the situation isn't near-complete by next weekend, then we revert to Plan 1 (even if I am reluctant about it, as I might be; a warning to my future self).

@picnixz
Copy link
Member

picnixz commented Mar 18, 2024

My suggestion would be: make this week a week of pytest-randomly in continuous integration, and smoke out what we can

Is there a script that can spot all errors? like, running many instances of the test suite would likely be very long and would monopolize the CI/CD resources so...

@jayaddison
Copy link
Contributor Author

My suggestion would be: make this week a week of pytest-randomly in continuous integration, and smoke out what we can

Is there a script that can spot all errors? like, running many instances of the test suite would likely be very long and would monopolize the CI/CD resources so...

After a test begins failing due to ordering concerns, there is a way to backtrack to get a reasonable degree of confidence about where the interacting tests might exist.

For example: for multiple --randomly-seed values: run the affected test individually; if it still passes, then run tests in that module only randomly; if it still passes, run tests in that directory randomly, ...

@jayaddison
Copy link
Contributor Author

(whether that exists as a reusable pytest plugin / functionality.. not sure)

@jayaddison
Copy link
Contributor Author

My suggestion would be: make this week a week of pytest-randomly in continuous integration, and smoke out what we can

Is there a script that can spot all errors? like, running many instances of the test suite would likely be very long and would monopolize the CI/CD resources so...

After a test begins failing due to ordering concerns, there is a way to backtrack to get a reasonable degree of confidence about where the interacting tests might exist.

For example: for multiple --randomly-seed values: run the affected test individually; if it still passes, then run tests in that module only randomly; if it still passes, run tests in that directory randomly, ...

The closest thing I've found so far is: https://github.com/pytest-dev/pytest-random-order (less-well-known than pytest-randomly, it seems).

It can shuffle tests at the class/module/package/global level.

However it doesn't do one thing that would be really helpful for us: systematically narrowing-down-to problem dependencies between tests (B only passes when A runs first), or interferences between tests (B begins failing when A runs first).

If all tests succeed globally in order, then if a single test (let's say the 5th test out of total 10) within a shuffled class fails when using random order, then I think that implies that the interfering test must have been one of the previous four tests. So a module could reshuffle those 5, using something like bisection to efficiently determine where the interference relationship begins.

@picnixz
Copy link
Member

picnixz commented Mar 20, 2024

I'm using the random-order because it's compatible with xdist.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants