Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tests] Implementation of the new testing plugin (backward-compatible) [part 2] #12093

Open
wants to merge 69 commits into
base: master
Choose a base branch
from

Conversation

picnixz
Copy link
Member

@picnixz picnixz commented Mar 14, 2024

Follows #12089.

This is the smallest possible set of changes that could be done in order to introduce the new plugin since pytest does not offer the possibilty to enable plugins per conftest file (i.e., I cannot test both the new plugin and keep the old one for the other tests).

As such, I need to merge both plugins (legacy and new implementations) in one plugin. I could like.. ignore the test part of the new plugin but it wouldn't make sense since no one would be able to see the plugin in action otherwise (and just pushing the utility functions does not seem right either).

Once #12089 is merged, this one can follow as it is based on top of the latter. After that, we need to fix each test individually so that they use the new features (for now, the plugin is still backwards compatible, though some parts can be removed when we're done with the cleanup).

cc @jayaddison

@picnixz
Copy link
Member Author

picnixz commented Mar 14, 2024

So, for those interested in reviewing:

There are two major parts in this PR. The first part is the implementation of the plugin itself. Most of the work is done in the internal package where no magic occurs. Markers are processed and arguments are derived according to some strategy that is explained in the comments.

The second part consists in testing the plugin itself. And there's a bit of magic there. As I said before, xdist is capricious. One major drawback is the lack of output capturing. When using pytester, you essentially run an instance of pytest in the process and you can check things directly in the tests you want. Now, imagine you have two tests and you want to be sure that test1 and test2 use different files. The issue is that you cannot test that in test1, nor can you test that in test2 since they need to be executed simultaneously (I want that). So, you would usually print inside the tests, capture the output and compare whatever you want.

The issue is that xdist does not allow you to capture ANYTHING you print using "print". The only way to capture something is to write to the disk (which I don't want to) or use the report sections of pytest + the '-rA' flags. That way, when test1 and test2 are done, the pytest you were running will show the report sections (even if you had used xdist to run test1 and test2 in parallel). And NOW, you can parse that output. This is the job of the "magico" plugin.

Feel free to ask any questions.

@chrisjsewell
Copy link
Member

Heya, as per #12089, could you change your initial comment, to be more specific about what issues this PR proposes to fix, and how it proposes to fix them

@picnixz picnixz marked this pull request as draft March 14, 2024 17:14
@picnixz
Copy link
Member Author

picnixz commented Mar 14, 2024

Sure! it's easier to comment with a new one I think:

So, for now, this PR is meant for preparing the stuff needed for fixing #11285. Currently, this PR only fixes the parts that are "detected" to be bad. More precisely, there are some tests that have side effects because they change their sources (some of them use a shared_result). However it's not good since changing the sources means being independent of every other test (unless you have a parametrized test, in which case, the same change happens for every loop invariant).

There are "3" levels of test isolation. More precisely, the sources directory is constructed as: "TMP_DIR/NAMESPACE/CONFIG/TESTROOT-UNIQUEID" where the NAMESPACE part only depends on the module and the class of the test (i.e., two test functions in the same module are in the same NAMESPACE), "CONFIG" depends on the test configuration (i.e., the parameters passed to pytest.mark.sphinx), "TESTROOT" is the name of the test root being used (just for visual purposes) and "UNIQUEID" is an optional ID to make the directory more unique if needed.

For the level 1 (default), you don't have this "-UNIQUEID" suffix. In addition, the files in "TMP_DIR/NAMESPACE/CONFIG/TESTROOT" will be marked as readonly (but only for this level 1).

Level 2 is only for tests using pytest.mark.parametrize. For those tests, you want to be in the same test directory but you still want to be isolated from the rest. It's as if you have a level 3 isolation where you change the parametrize into a huge for-loop. If you use a level 3, you will have every parametrization in a different folder which makes the test suite very very slow ! it has more or less the same effect using shared_result but where you don't need to care about the shared_result id you choose (which might come bite you after!).

Level 3 ensures that you will never have two tests using the same sources directory. In general, if we cannot decide on the isolation correctly, we use a full isolation.


I'll try to find a way to generate smaller paths on Windows since it doesn't like long path names... Actually, the issue is because of cython which seems to generate a VERY VERY VERY long path so I'll need to investigate.

EDIT: Now, it is still possible to pass an explicit srcdir keyword to the Sphinx marker. If you do that, NAMESPACE and CONFIG will be '-' and 0 respectively and TESTROOT will be changed to that srcdir value, possibly suffixed by some random ID for uniqueness. That way the paths on windows is reduced.

In addition, I reduced the entropy of the random IDs to roughly 32-bits (8 hex chars) since otherwise the paths are too long.

@picnixz
Copy link
Member Author

picnixz commented Mar 14, 2024

Also, the SharedResult class was using a weird construction (like, you use instances for fixtures but use a class variable to hold whatever you want for the whole module). It's fine but it's actually cleaner to use the stash object that pytest suggests instead. Also, having the fixture name clash with the marker argument without being of the same type was confusing for me so I renamed it (shared_result -> module_cache) but kept the old one for backwards compat.

@picnixz
Copy link
Member Author

picnixz commented Mar 14, 2024

Aaaand, finally, I've introduced a better lazy app which does what you think it should do. Like, if you use freshenv=True but use a shared result, it should not be accepted. Same as if you use "force_all" (I explained the rationale at the code level).

sphinx/testing/internal/cache.py Outdated Show resolved Hide resolved
sphinx/testing/internal/markers.py Outdated Show resolved Hide resolved
sphinx/testing/internal/markers.py Outdated Show resolved Hide resolved
sphinx/testing/internal/markers.py Outdated Show resolved Hide resolved
tests/conftest.py Outdated Show resolved Hide resolved
@picnixz picnixz marked this pull request as ready for review March 15, 2024 12:46
@jayaddison
Copy link
Contributor

In the back of my mind I am thinking: the cost and time savings from this change could be significant - so is it a risk to delay/block it? It could be.

However, when I think about what a pull request to enable parallelism should contain, I would ideally expect that it should be no more than 50 lines of code - an additional dependency, some config changes, some CI workflow changes.

Instead it seems we're adding more code within the codebase to work around the fact that the application is not parallelizable in some fundamental ways. One of those is the in-process mutation of objects/imports that I'm aware autodoc and others do -- but even so my sense is that solving the problem by working backwards towards the origins would be long-term much more effective, because otherwise we're adding yet more layers that become difficult to unpick (that being the history of the test suite and plugin becoming more complex instead of less complex).

The annoying part is that, again, I don't have better suggestions, and so I think my vote review should be considered a +0 (no impact positive/negative on whether we merge this).

@picnixz
Copy link
Member Author

picnixz commented Mar 18, 2024

I don't yet understand all the context, but to me it seems that running our own tests in a parallelizable way is worth decoupling from providing that same functionality to third-party test suites

Then this means not using our own plugin at all and having a plugin at the tests/ level. I can do it (and then sphinx/testing will not be changed at all) without too much work though.

@picnixz
Copy link
Member Author

picnixz commented Mar 18, 2024

the cost and time savings from this change could be significant - so is it a risk to delay/block it? It could be.

Not really. It's not that hard for me to move pieces around or refactor things. But in light of your previous comment, I will move everything to our local tests so that I reduce the risk downstream as much as possible. It's also more effective that way and I'm confident that I could fix any future issue (like, if there's is an issue, we can just switch back to the old plugin implementation and voilà).

However, when I think about what a pull request to enable parallelism should contain, I would ideally expect that it should be no more than 50 lines of code - an additional dependency, some config changes, some CI workflow changes.

Yes, that's what I would have expected but xdist is not friendly with the output so there are some things that must be done once you've enabled this plugin (like, every part that has a print will NOT be rendered). Now, if you enable xdist, you need to fix the side-effects in a single PR (otherwise the CI/CD will not work at all!), whereas with the approach where I have a default isolation strategy, you can fix side-effects in multiple PRs.

Instead it seems we're adding more code within the codebase to work around the fact that the application is not parallelizable in some fundamental ways

The application is parallelizable but it's not friendly with pytest + xdist at some point.

One of those is the in-process mutation of objects/imports that I'm aware autodoc and others do -- but even so my sense is that solving the problem by working backwards towards the origins would be long-term much more effective, because otherwise we're adding yet more layers that become difficult to unpick (that being the history of the test suite and plugin becoming more complex instead of less complex).

Half of the side-effects bugs come from autodoc, the other half comes from the modifications on sources that are then kept inbetween tests (and thus, making them fail if they are executed in an arbitrary order). Technically speaking, my PR (and the following) will not directly solve that since it's a Python-related and not a filesystem related issue.

There are hundreds of autodoc tests and it's difficult to locate which one has a dependency on the other since Sphinx changes somtimes the module's annotations and some modules are used more than once by the autodoc tests (sometimes only a part of those modules are used). I have a planned fixture to fix those tests though but it's not in this PR yet and you cannot really see it in effect unless you enable the parallelization (you could enable the plugin for this subset of tests but because of the pytest integration, parametrized tests will need special casing in order to patch the workers instantiations).


There is ONE possibility to have an isolation strategy that could be achieved by pytest so I'll try this one. This is something Adam tried in #11285 (comment) but comes at the cost of 4x slowdown. The slowdown is probably (I think) due to the parametrized tests and is not compatible with the shared_result approach. So I could try something that makes it compatible but I'm not sure how good it will be.

@jayaddison
Copy link
Contributor

the cost and time savings from this change could be significant - so is it a risk to delay/block it? It could be.

Not really. It's not that hard for me to move pieces around or refactor things. But in light of your previous comment, I will move everything to our local tests so that I reduce the risk downstream as much as possible. It's also more effective that way and I'm confident that I could fix any future issue (like, if there's is an issue, we can just switch back to the old plugin implementation and voilà).

I do believe that you'd be able to maintain/support/improve this, yep. However: I'd prefer (selfishly!) for your time for that to be available for core Sphinx issues.

Also selfishly, I don't know how much of my own time I'd be able (or want) to offer to fixing test plugin issues. Creating a certan amount of 'testing gravity' could help us make sure plugins/themes/etc are remaining compatible - and if so perhaps I'd end up feeling satisfied that we're improving ecosystem consistency and would in fact start to help -- but we only have a few active maintainers/committers so I think there's a risk of DDoS'ing ourselves.

(hard to quantify the risks, especially without data, but these are the things I'm thinking about)

@picnixz
Copy link
Member Author

picnixz commented Mar 18, 2024

Here are some plans that I can suggest (honestly, it was fun on my side to play with pytest so I don't mind the time loss).

Plan 1

  • Continue with what I did, but only do it for Sphinx itself. We might change in the future the original plugin implementation to make it more robust but we will not make it "xdist" friendly (and that will be left to downstream extensions).
  • I move the new implementation of the fixtures in our local tests/. Maybe there are things I don't need so I'll cleanup some utils (like, the find_context function is fine and helpful, but perhaps it's an overkill).
  • We fix the tests by using the isolate approach (which is cleaner than using your own unique ID).

Plan 2

@picnixz picnixz mentioned this pull request Apr 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants