[tests] Implementation of the new testing plugin (backward-compatible) [part 2] #12093

picnixz · 2024-03-14T16:03:36Z

Follows #12089.

This is the smallest possible set of changes that could be done in order to introduce the new plugin since pytest does not offer the possibilty to enable plugins per conftest file (i.e., I cannot test both the new plugin and keep the old one for the other tests).

As such, I need to merge both plugins (legacy and new implementations) in one plugin. I could like.. ignore the test part of the new plugin but it wouldn't make sense since no one would be able to see the plugin in action otherwise (and just pushing the utility functions does not seem right either).

Once #12089 is merged, this one can follow as it is based on top of the latter. After that, we need to fix each test individually so that they use the new features (for now, the plugin is still backwards compatible, though some parts can be removed when we're done with the cleanup).

cc @jayaddison

picnixz · 2024-03-14T16:18:25Z

So, for those interested in reviewing:

Be aware that xdist is probably the most capricious framework I've worked with for pytest. Note that I added it as a dependency but we do not use it (for now). I also pinned the dependency because I need to hack into it at some point.
Ignore the changes in the *.rst files since they are the same as [tests] add public properties SphinxTestApp.status and SphinxTestApp.warning #12089.

There are two major parts in this PR. The first part is the implementation of the plugin itself. Most of the work is done in the internal package where no magic occurs. Markers are processed and arguments are derived according to some strategy that is explained in the comments.

The second part consists in testing the plugin itself. And there's a bit of magic there. As I said before, xdist is capricious. One major drawback is the lack of output capturing. When using pytester, you essentially run an instance of pytest in the process and you can check things directly in the tests you want. Now, imagine you have two tests and you want to be sure that test1 and test2 use different files. The issue is that you cannot test that in test1, nor can you test that in test2 since they need to be executed simultaneously (I want that). So, you would usually print inside the tests, capture the output and compare whatever you want.

The issue is that xdist does not allow you to capture ANYTHING you print using "print". The only way to capture something is to write to the disk (which I don't want to) or use the report sections of pytest + the '-rA' flags. That way, when test1 and test2 are done, the pytest you were running will show the report sections (even if you had used xdist to run test1 and test2 in parallel). And NOW, you can parse that output. This is the job of the "magico" plugin.

Feel free to ask any questions.

chrisjsewell · 2024-03-14T17:08:52Z

Heya, as per #12089, could you change your initial comment, to be more specific about what issues this PR proposes to fix, and how it proposes to fix them

picnixz · 2024-03-14T17:25:37Z

Sure! it's easier to comment with a new one I think:

So, for now, this PR is meant for preparing the stuff needed for fixing #11285. Currently, this PR only fixes the parts that are "detected" to be bad. More precisely, there are some tests that have side effects because they change their sources (some of them use a shared_result). However it's not good since changing the sources means being independent of every other test (unless you have a parametrized test, in which case, the same change happens for every loop invariant).

There are "3" levels of test isolation. More precisely, the sources directory is constructed as: "TMP_DIR/NAMESPACE/CONFIG/TESTROOT-UNIQUEID" where the NAMESPACE part only depends on the module and the class of the test (i.e., two test functions in the same module are in the same NAMESPACE), "CONFIG" depends on the test configuration (i.e., the parameters passed to pytest.mark.sphinx), "TESTROOT" is the name of the test root being used (just for visual purposes) and "UNIQUEID" is an optional ID to make the directory more unique if needed.

For the level 1 (default), you don't have this "-UNIQUEID" suffix. In addition, the files in "TMP_DIR/NAMESPACE/CONFIG/TESTROOT" will be marked as readonly (but only for this level 1).

Level 2 is only for tests using pytest.mark.parametrize. For those tests, you want to be in the same test directory but you still want to be isolated from the rest. It's as if you have a level 3 isolation where you change the parametrize into a huge for-loop. If you use a level 3, you will have every parametrization in a different folder which makes the test suite very very slow ! it has more or less the same effect using shared_result but where you don't need to care about the shared_result id you choose (which might come bite you after!).

Level 3 ensures that you will never have two tests using the same sources directory. In general, if we cannot decide on the isolation correctly, we use a full isolation.

I'll try to find a way to generate smaller paths on Windows since it doesn't like long path names... Actually, the issue is because of cython which seems to generate a VERY VERY VERY long path so I'll need to investigate.

EDIT: Now, it is still possible to pass an explicit srcdir keyword to the Sphinx marker. If you do that, NAMESPACE and CONFIG will be '-' and 0 respectively and TESTROOT will be changed to that srcdir value, possibly suffixed by some random ID for uniqueness. That way the paths on windows is reduced.

In addition, I reduced the entropy of the random IDs to roughly 32-bits (8 hex chars) since otherwise the paths are too long.

picnixz · 2024-03-14T17:29:11Z

Also, the SharedResult class was using a weird construction (like, you use instances for fixtures but use a class variable to hold whatever you want for the whole module). It's fine but it's actually cleaner to use the stash object that pytest suggests instead. Also, having the fixture name clash with the marker argument without being of the same type was confusing for me so I renamed it (shared_result -> module_cache) but kept the old one for backwards compat.

picnixz · 2024-03-14T17:30:39Z

Aaaand, finally, I've introduced a better lazy app which does what you think it should do. Like, if you use freshenv=True but use a shared result, it should not be accepted. Same as if you use "force_all" (I explained the rationale at the code level).

sphinx/testing/internal/cache.py

sphinx/testing/internal/markers.py

tests/conftest.py

jayaddison · 2024-03-17T17:54:53Z

In the back of my mind I am thinking: the cost and time savings from this change could be significant - so is it a risk to delay/block it? It could be.

However, when I think about what a pull request to enable parallelism should contain, I would ideally expect that it should be no more than 50 lines of code - an additional dependency, some config changes, some CI workflow changes.

Instead it seems we're adding more code within the codebase to work around the fact that the application is not parallelizable in some fundamental ways. One of those is the in-process mutation of objects/imports that I'm aware autodoc and others do -- but even so my sense is that solving the problem by working backwards towards the origins would be long-term much more effective, because otherwise we're adding yet more layers that become difficult to unpick (that being the history of the test suite and plugin becoming more complex instead of less complex).

The annoying part is that, again, I don't have better suggestions, and so I think my vote review should be considered a +0 (no impact positive/negative on whether we merge this).

picnixz · 2024-03-18T08:25:14Z

I don't yet understand all the context, but to me it seems that running our own tests in a parallelizable way is worth decoupling from providing that same functionality to third-party test suites

Then this means not using our own plugin at all and having a plugin at the tests/ level. I can do it (and then sphinx/testing will not be changed at all) without too much work though.

picnixz · 2024-03-18T08:40:47Z

the cost and time savings from this change could be significant - so is it a risk to delay/block it? It could be.

Not really. It's not that hard for me to move pieces around or refactor things. But in light of your previous comment, I will move everything to our local tests so that I reduce the risk downstream as much as possible. It's also more effective that way and I'm confident that I could fix any future issue (like, if there's is an issue, we can just switch back to the old plugin implementation and voilà).

However, when I think about what a pull request to enable parallelism should contain, I would ideally expect that it should be no more than 50 lines of code - an additional dependency, some config changes, some CI workflow changes.

Yes, that's what I would have expected but xdist is not friendly with the output so there are some things that must be done once you've enabled this plugin (like, every part that has a print will NOT be rendered). Now, if you enable xdist, you need to fix the side-effects in a single PR (otherwise the CI/CD will not work at all!), whereas with the approach where I have a default isolation strategy, you can fix side-effects in multiple PRs.

Instead it seems we're adding more code within the codebase to work around the fact that the application is not parallelizable in some fundamental ways

The application is parallelizable but it's not friendly with pytest + xdist at some point.

One of those is the in-process mutation of objects/imports that I'm aware autodoc and others do -- but even so my sense is that solving the problem by working backwards towards the origins would be long-term much more effective, because otherwise we're adding yet more layers that become difficult to unpick (that being the history of the test suite and plugin becoming more complex instead of less complex).

Half of the side-effects bugs come from autodoc, the other half comes from the modifications on sources that are then kept inbetween tests (and thus, making them fail if they are executed in an arbitrary order). Technically speaking, my PR (and the following) will not directly solve that since it's a Python-related and not a filesystem related issue.

There are hundreds of autodoc tests and it's difficult to locate which one has a dependency on the other since Sphinx changes somtimes the module's annotations and some modules are used more than once by the autodoc tests (sometimes only a part of those modules are used). I have a planned fixture to fix those tests though but it's not in this PR yet and you cannot really see it in effect unless you enable the parallelization (you could enable the plugin for this subset of tests but because of the pytest integration, parametrized tests will need special casing in order to patch the workers instantiations).

There is ONE possibility to have an isolation strategy that could be achieved by pytest so I'll try this one. This is something Adam tried in #11285 (comment) but comes at the cost of 4x slowdown. The slowdown is probably (I think) due to the parametrized tests and is not compatible with the shared_result approach. So I could try something that makes it compatible but I'm not sure how good it will be.

jayaddison · 2024-03-18T09:23:09Z

the cost and time savings from this change could be significant - so is it a risk to delay/block it? It could be.

Not really. It's not that hard for me to move pieces around or refactor things. But in light of your previous comment, I will move everything to our local tests so that I reduce the risk downstream as much as possible. It's also more effective that way and I'm confident that I could fix any future issue (like, if there's is an issue, we can just switch back to the old plugin implementation and voilà).

I do believe that you'd be able to maintain/support/improve this, yep. However: I'd prefer (selfishly!) for your time for that to be available for core Sphinx issues.

Also selfishly, I don't know how much of my own time I'd be able (or want) to offer to fixing test plugin issues. Creating a certan amount of 'testing gravity' could help us make sure plugins/themes/etc are remaining compatible - and if so perhaps I'd end up feeling satisfied that we're improving ecosystem consistency and would in fact start to help -- but we only have a few active maintainers/committers so I think there's a risk of DDoS'ing ourselves.

(hard to quantify the risks, especially without data, but these are the things I'm thinking about)

picnixz · 2024-03-18T09:52:45Z

Here are some plans that I can suggest (honestly, it was fun on my side to play with pytest so I don't mind the time loss).

Plan 1

Continue with what I did, but only do it for Sphinx itself. We might change in the future the original plugin implementation to make it more robust but we will not make it "xdist" friendly (and that will be left to downstream extensions).
I move the new implementation of the fixtures in our local tests/. Maybe there are things I don't need so I'll cleanup some utils (like, the find_context function is fine and helpful, but perhaps it's an overkill).
We fix the tests by using the isolate approach (which is cleaner than using your own unique ID).

Plan 2

Abandon the new plugin implementation and fix our tests bit by bit. We can modify a bit our fixtures so that srcdir is betterly chosen internally and make it xdist compatible for our needs but no refactorization on the markers themselves.
It will be harder to fix the tests one by one because it's hard to see which tests have dependencies or not. While this is true for Plan 1, in this plan, we don't have this automatic isolation according to the test module + configuration and then there are probably more tests that need to be isolated manually.
Some helper functions need to be refactored (as mentioned in [tests] Make the 'test_html_copy_source' case agnostic of test evaluation order. #12118 (comment) and [tests] Make the 'test_html_copy_source' case agnostic of test evaluation order. #12118 (comment)). But we need to figure out which ones need to be refactored...

picnixz added 8 commits March 14, 2024 13:04

patch marker signatures

a4ef0a5

update doc

793f45c

revert suppression

4bc677f

add xdist dependency but disable it for now

c69f375

Merge branch 'master' into fix/testing-types

02e7813

fixup

42f6044

Merge branch 'fix/testing-types' into core/test-plugin

e17f0b7

implement new plugin

b5457e8

picnixz requested a review from jayaddison March 14, 2024 16:03

picnixz mentioned this pull request Mar 14, 2024

[WIP] Re-implementation of the test suite. #12018

Closed

picnixz added 2 commits March 14, 2024 17:06

fixup

550a30f

fixup

2235e7f

picnixz added 4 commits March 14, 2024 17:36

try to fix windows

c8ffbd8

fixup

2b372cc

fixup?

866059e

fixup

9ee89b4

picnixz marked this pull request as draft March 14, 2024 17:14

picnixz added the type:tests label Mar 14, 2024

picnixz commented Mar 14, 2024

View reviewed changes

picnixz added 5 commits March 15, 2024 09:49

fix a check + test

100f861

Merge branch 'master' into fix/testing-types

30b82d7

fixup

875ef51

make the plugin backwards compatible

bc00228

Merge branch 'master' into core/test-plugin

4ce0115

picnixz marked this pull request as ready for review March 15, 2024 12:46

jayaddison mentioned this pull request Mar 18, 2024

[tests] Make the 'test_html_copy_source' case agnostic of test evaluation order. #12118

Open

update comment

0da65df

chrisjsewell mentioned this pull request Mar 18, 2024

Have one clear API for calling Sphinx without unintended side-effects (deprecate SphinxTestApp?) #12130

Open

picnixz added 15 commits March 23, 2024 16:57

Merge branch 'master' into core/test-plugin

f2e1044

fixup isolation deduction

31e61b5

fixup isolation deduction

86de828

cleanup

329b98b

apply ruff format

2a5d03a

Merge branch 'master' into core/test-plugin

455b97d

fixup

e33ba40

Merge branch 'master' into core/test-plugin

847eb7a

Merge branch 'master' into core/test-plugin

f787670

Merge branch 'master' into core/test-plugin

a257ec2

fix lint

a7e4026

Merge branch 'master' into core/test-plugin

da6e9c5

fix lint

054cb79

Merge branch 'master' into core/test-plugin

e859d9f

fixup

f68197b

picnixz mentioned this pull request Apr 10, 2024

7.3.0 release plan #12242

Closed

picnixz added 6 commits April 19, 2024 09:29

Merge remote-tracking branch 'upstream/master' into core/test-plugin

d71fdf5

cleanup

e431ce9

cleanup

74d96a3

cleanup

9560a65

Merge branch 'master' into core/test-plugin

a452a03

Merge branch 'master' into core/test-plugin

9ca2a3b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[tests] Implementation of the new testing plugin (backward-compatible) [part 2] #12093

[tests] Implementation of the new testing plugin (backward-compatible) [part 2] #12093

picnixz commented Mar 14, 2024 •

edited

picnixz commented Mar 14, 2024

chrisjsewell commented Mar 14, 2024

picnixz commented Mar 14, 2024 •

edited

picnixz commented Mar 14, 2024

picnixz commented Mar 14, 2024

jayaddison commented Mar 17, 2024

picnixz commented Mar 18, 2024

picnixz commented Mar 18, 2024

jayaddison commented Mar 18, 2024

picnixz commented Mar 18, 2024

[tests] Implementation of the new testing plugin (backward-compatible) [part 2] #12093

Are you sure you want to change the base?

[tests] Implementation of the new testing plugin (backward-compatible) [part 2] #12093

Conversation

picnixz commented Mar 14, 2024 • edited

picnixz commented Mar 14, 2024

chrisjsewell commented Mar 14, 2024

picnixz commented Mar 14, 2024 • edited

picnixz commented Mar 14, 2024

picnixz commented Mar 14, 2024

jayaddison commented Mar 17, 2024

picnixz commented Mar 18, 2024

picnixz commented Mar 18, 2024

jayaddison commented Mar 18, 2024

picnixz commented Mar 18, 2024

Plan 1

Plan 2

picnixz commented Mar 14, 2024 •

edited

picnixz commented Mar 14, 2024 •

edited