Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Session-Scoped Fixtures are not Session-Scoped with Pytest-Xdist #271

Open
drake-mer opened this issue Jan 27, 2018 · 65 comments
Open

Session-Scoped Fixtures are not Session-Scoped with Pytest-Xdist #271

drake-mer opened this issue Jan 27, 2018 · 65 comments
Labels

Comments

@drake-mer
Copy link

I am fairly new to this project. I very recently migrated a software project test-suite from nosetest to pytest, mainly because of the Xdist benefits I had heard of.

The problem is that my Tests are depending on a big fixture setup (table creation + heavy loading of data) that I would like to share across all my tests.

The current Xdist behaviour is as follow:

  1. Collect test
  2. Split tests amongst the user-defined number of processes
  3. Launch a pytest session for each worker

Obviously, if each test depend on a heavy fixture, then multiplying the number of creation per number of worker is not going to help.

Additionally, it simply breaks the expected pytest behaviour for 'session' scoped fixtures.

I think this should be fairly simple to address this problem although I didn't take a really deep look into it. If you need help to solve this problem, I am more than willing to contribute if you feel this suggestion for improvement is relevant.

Greetings.

@nicoddemus
Copy link
Member

Hi @elijahbal,

The problem is that each worker lives in a different process, so it would be hard to share a single session scoped fixture between the workers in a general manner.

For one you have the problem of serializing any object returned by a fixture so it can be sent to another process, on demand. pickle might seem like the obvious solution to this problem, but a lot of objects are not pickable.

Another problem, much harder IMHO, is how do you keep the object returned by the fixture synchronized among the worker processes? After all a session fixture might return a mutable object, so its state must be synchronized somehow as well.

Another problem is that one must also deal with parallelism, given that tests are now running in parallel in separate processes. The resources probably will need to be synchronized using a lock.

Those are the reasons from the top of my head why it would be hard to implement the proper scope rules in a way that session scoped fixtures are only instantiated once per-session, as opposed to once per-worker like it is today.

@drake-mer
Copy link
Author

drake-mer commented Jan 28, 2018

Happy to have so fast an answer. Yes, the problem of having a different space for each process to live on is a tough one, and parallelizing the computation comes at the cost of a greater complexiy for handling the process, as far I understand.

First of all, concerning the ability to serialize and to large objects between processes, I agree with you, in my experience, it is hard to serialize complex objects with pickle.

As for the third difficulty you mention, yes, indeed, the more direct-and-obvious approach to have global session object is to use a system lock at the operating system level, regardless of the programming language.

For your point 1 and 2, I want to adress some part of the problem and I think it will not appear a so big difficulty after examining the use case more closely.

The typical use case of spawning large fixture objects is usually not to share big collections of references (aka very big objects), but rather to prepare the underlying system to be in a specific case. We could include for example:

  • spawning many containerized application for a large micro-services software
  • setting up a database in a pristine state with a clean table structure
  • Gathering a big set of data on the local storage drive from a collection of network connections
  • etc.

So here the problem is not so much to share a big object between process but to ensure that the fixture preparation process is run only once from a single process. The synchronization between processes of such objects is not really important, because basically these are objects that are precisely meant to be accessed in a concurrent fashion (webserver, database, etc).

From this point of view, I think (but maybe I am wrong) that the xdist initialization step (collection of test and the preparation of the test sessions with distribution of tests amongst runner) is especially well suited to perform such a task.

So maybe a good compromise would be to have a special xdist marker for such fixtures that would care about the initialization process. As for the return value of such fixtures, it would be probably best to enforce for such special fixtures that they return at first call an immutable primitive object (eg a string or at max a tuple of strings) that would be kept in memory by xdist.

Does it seem to you a good idea ? And does it seem to you something doable in practice given the current xdist codebase ?

@ghost
Copy link

ghost commented Mar 23, 2018

I'm having a similar issue with session scoped fixtures when n > 1. My project does some of the things @elijahbal points out above.

The teardown section in the session fixture is being executed more than once, and worst of it, out of order: before other tests have ended.

I'm looking for a way to signal "there are no more pending tests" to workaround this. If such case, a worker can check if it's the last one and effectively execute such section or do nothing. Is there a way? Would be very helpful! (i.e. a lock as @nicoddemus commented)

@drake-mer
Copy link
Author

No idea actually how this could be done. The quickest workaround I could come with is to use a separate resource for each one of the xdist process.

@nicoddemus
Copy link
Member

Hi folks, sorry for the late response, this issue slipped through the cracks.

@elijahbal

Does it seem to you a good idea ? And does it seem to you something doable in practice given the current xdist codebase ?

Not really, and I don't think just because of pytest-xdist but how pytest itself is structured, the session scope fixture instantiated in some worker will be destroyed once that worker runs out of tests, regardless if there are other workers still needing that resource for example.

@elijahbal and @jrodriguezntt I can't find a way to do what you need with fixtures, but perhaps using plain hooks executed only on master can be responsible for creating the shared resource and destroying it (pytest_sessionstart/finish come to mind)? This has the problem of course that you might end up initializing a resource that won't even be used (for example if -k is used to select some tests), but might be worth giving this a try.

@ghost
Copy link

ghost commented Apr 3, 2018

I've managed to workaround this by doing nothing in the teadown section (after yield). A cleanup bash script is executed after py.test, but it's a pity, because this (IMHO) breaks the logic of the scope='session' option (once per session). I understand the technical problem, nonetheless.
One approach could be to use semaphores or any other shared resource if workers are running in the same machine. Another one is to use a shared resource like memchached, but this might be a bit too complicated (not sure if it's worth it).
The hook on master is also a good idea, provided it's the last one to execute (once the workers have finished). But I have no experience using them. Any suggestions will be welcome! :)

@RonnyPfannschmidt
Copy link
Member

each process is a own session, while its a nice to have feature to have a scope that spanns processes, its also important to note that there is no good sane and simple way to implement it - the proposed approaches are all no-gos on techical merit

@david-fliguer-ws
Copy link

Hi,

I'm experiencing this same issue, I have tests that I want to run in parallel but a setup that has to be done only once before all the tests run

According to what I see the setUp is run once per thread that xdist creates

Is it possible to make a mutex between the xdist threads ? Do we have access to any like this ?

@marc-h38
Copy link

marc-h38 commented Apr 27, 2018

While the following workarounds are probably obvious to people discussing here, they may not be obvious to xdist noobs like me who googled and landed here while skipping the documentation. So mentioning them briefly:

    if ("PYTEST_XDIST_WORKER" not in os.environ or
            os.environ["PYTEST_XDIST_WORKER"] == "gw0"):
         <initialize something shared in common>
         <create some empty file INIT_DONE_BY_THREAD_ZERO>
    else:
          <wait for INIT_DONE_BY_THREAD_ZERO file to exist>
    if ("PYTEST_XDIST_WORKER_COUNT" not in os.environ or
            os.environ["PYTEST_XDIST_WORKER_COUNT"] == 1):
         ....

Another problem, much harder IMHO, is how do you keep the object returned by the fixture synchronized among the worker processes? After all a session fixture might return a mutable object, so its state must be synchronized somehow as well.

In my case and I guess many others, the part of the fixture that is time-consuming to download is read-only. Modern languages encourage read-only structures for obvious... concurrency reasons https://doc.rust-lang.org/book/second-edition/ch03-01-variables-and-mutability.html

Note in my case the "better" fix would probably be to move the downloads outside Python and to "make test" and "make clean" targets - but that is more work.

@RonnyPfannschmidt
Copy link
Member

how about simply putting a http download cache into .pytest-cache

@pv
Copy link

pv commented May 1, 2018

How do you synchronize access to .pytest-cache (or request.config.cache.makedir)?
Does pytest have something builtin for this?

I would be ok with a solution where the first worker that arrives does the computation and stores it in cache, and those that come after use it. But multiplatform locking is annoying...

@rogerdahl
Copy link

rogerdahl commented May 1, 2018

I use Postgres, and found that create database from template is much faster than creating a database fixture from scratch. So I have each process launched by xdist create its database from a template database. If the template does not exist, I use a system wide lock to have the xdist process that gains the lock first create the template (from JSON files), while blocking the others from proceeding. When the template is ready, the lock is released, all the other processes find that the template already exists and create their database instances directly from the template. I went from maybe 10 minutes to a couple of seconds getting the database fixtures set up.

@pytest.yield_fixture(scope='session')
def db_setup(request):
  db_set_unique_db_name(request)

  with posix_ipc.Semaphore(
      '/{}'.format(__name__), flags=posix_ipc.O_CREAT, initial_value=1
  ):
    if not db_template_exists():
      db_template_create_blank()
      db_template_populate_by_json()
      db_template_migrate()

  db_create_from_template()

  yield

  db_drop()

Notes about the lock:

posix_ipc is at https://pypi.org/project/posix_ipc/ (No affiliation).

The regular multiprocessing.Lock() context manager did not work for me here. Also tried creating the lock at module scope, and also directly calling acquire() and release(). It's probably related to how the worker processes relate to each other when launched by pytest-xdist as compared to what the multiprocessing module expects.

@smasterfree
Copy link

or use a file lock? only when you finish your init ENV , others process will go on?

@eode
Copy link

eode commented Sep 26, 2018

Don't you already have IPC/an event system between Manager and Workers?

Couldn't you use the existing events system to handle fixtures xdist_group_lock, xdist_global_lock and xdist_group and xdist_global, where:

  • xdist_xxx_lock are locks that can be acquired by workers
  • xdist_xxx are dicts restricted to hashable, serializable data, where small size is recommended

..?

Then you could easily do:

def some_test(xdist_group_lock, xdist_group):
    xdist_group_lock.acquire()   # blocks
    if not xdist_group['initialized']:
        initialize_things()
        xdist_group['initialized'] = True
    xdist_group_lock.release()

..and those primitives aren't too hard to do over a network -- particularly one you already have an event system over. Yes, it would slow things down a tad for some people that use it, particularly if they abuse it, but there's still a net benefit in most cases.

@nicoddemus
Copy link
Member

@eode it is probably possible. I believe this can even be implemented outside xdist as a proof of concept first.

@eode
Copy link

eode commented Oct 13, 2018

I've done this kind of control with other pytest plugins by using multiprocessing.lock() set up in conftest.py, but those plugins used the multiprocessing module, and apparently loaded conftest.py before forking / creating other processes.

However, I don't know the method that xdist uses to distribute work. However xdist does it, making a lock in conftest.py doesn't work. My suspicion is that xdist either doesn't use the multiprocessing module, or it forks before conftest.py is imported.

So, proof-of-concept works -- however, some kind of distributed Lock (and probably a distributed dict) would be needed, but it needs to be via a means of communication that is just as guaranteed as your normal IPC -- so, using whatever IPC you normally use.

As to the general principle -- it's been tested, and works.

@RonnyPfannschmidt
Copy link
Member

xdist currently uses execnet and is incompatible with multiprocessing mechanisms #302 is supposed to elevate that

@nicoddemus
Copy link
Member

I believe this can even be implemented outside xdist as a proof of concept first.

I should have been clearer, but I meant that providing such a fixture, which uses a file lock behind the scenes, is possible.

@eode
Copy link

eode commented Nov 2, 2018

There are other use cases that a file lock won't cover -- like when doing distributed testing, and using the same database back-end, or doing some other kind of setup that applies collectively. Locks can be done with channels, with the lock hosted server-side.

@rogerdahl
Copy link

@eode Network distributed testing should be covered by another type of lock, since implicit locking between machines probably would be unwanted as often as not. The new type of lock should probably use the database. I think testing based on other network shared resources is esoteric enough that it should be handled on a case by case basis by the user.

@eode
Copy link

eode commented Nov 2, 2018

Not that a random user matters much, but FYI this is the reason I chose and use a less-developed framework for parallel testing.

I've implemented a lock, and a dict, with execnet before, it's definitely doable -- and if the client isn't able to communicate with the server via channels, then all is pretty much lost there anyways, as far as execnet is concerned.

What do you mean about implicitly locking? That seems like a bad idea. I was more thinking of a 'sessionlock', 'modulelock', and 'classlock' fixture that would need to be explicitly called and used by a test.

The raw fact of the matter is that sometimes, synchronization is needed between clients that aren't on the same physical system with access to the same data. Communication is also sometimes needed.

@rogerdahl
Copy link

With implicit locking, I was thinking about session scoped fixtures automatically being session scoped across multiple machines on the network.

Support for various more specific scopes, like a network wide session scope, would be nice. Might be better than exposing locks directly?

@eode
Copy link

eode commented Nov 4, 2018

Hey, that's not a bad idea. An 'xdist-session' scope that executes before even distributing tests would solve a lot of use cases that would otherwise require synchronization. It would even remove the need for locking in many cases.

Speaking of scopes, i discovered in pytest that the session scope executes after tests have been imported and organized. The intuitive place to mock globally seemed to be the session scope, but not so. If mocking somelib.bar from the session scope, you need to track down every case of from somelib import bar and patch that individually.

That makes mocking of some python libraries a pain, since there's no scope that supports post-conftest, pre-code-import. I think I ended up piggy-backing on the config plugin hook or similar. That worked, but coming from outside the project, it wasn't easy figuring out what to do, and was very counterintuitive and not clearly documented -- which really stands out in pytest, because pytest is mostly intuitive and well-documented. :-)

Also to pytest's credit, the fix once found was nice and succinct.

Anyways, to the point - Additional scopes sound like a good idea to me.

@rogerdahl
Copy link

I'm not familiar with execnet, but the implementation overview mentions a pytest_xdist_make_scheduler hook. Maybe that's a good starting point for adding the scopes?

Unfortunately, I probably won't have time to help with the implementation. The workaround I outlined previously in this ticket was all that was needed in my particular case.

@neXussT
Copy link

neXussT commented Oct 24, 2019

I am also having this issue with n>1. My session fixtures are executed multiple times. I am testing with AWS resources, and keep hitting "Resource Conflict", due to multiple fixtures trying to write environmental variables to the same lambda at the same time.
This could be solved, as stated above with the session fixtures running once.

@RonnyPfannschmidt
Copy link
Member

a key problem with that is that python has no exchange mechanism that allows to do this safely over multiple processes

i believe something like pytest_configure_resources that sets up the global resources would help as a hook, design work on that one is needed

@kousu
Copy link

kousu commented Jan 21, 2021

test-run, test-node and session are good!

@kapilt
Copy link

kapilt commented Jan 24, 2021

@kousu afaics this seems to be missing some things I was doing in my impl for track fixture to test as dependency map to ensure tear down after last scoped usage of a fixture, which is why I ended up with a WAL on the test reporting protocol from xdist worker to controller process. which makes me a little cautious about this implementation. afaics looking it over its racing in all worker processes to create (which is fine), but then its tearing down in any worker process that finishes tests, which may still result in multiple fixture initialization on a given node if an unfinished worker process still has to execute a test with the given fixture.

@leorochael
Copy link

leorochael commented Apr 23, 2021

What I've been using in lieu of locks is session fixtures is to define a pytest_sessionstart(session) on the root conftest.py of my project.

When running in parallel, this function will be called exactly once on the master process (which will not actually run any tests), then once on each xdist worker that is launched, before any session fixtures are run.

When not running in parallel, this function will be called only once.

The way to distinguish between the master is by the presence of a workerinput attribute in session.config, like this:

import sys

def pytest_sessionstart(session):
    workerinput = getattr(session.config, 'workerinput', None)
    if workerinput is None:
        print("Running on Master Process, or not running in parallel at all.", file=sys.stderr)
    else:
        print(f"\nRunning on Worker: {workerinput['workerid']}\n", file=sys.stderr)

Besides access to workerinput, the session parameter also gives access to the pytest cache in session.config.cache

Values set, and directories created, through the session.config.cache in the master process are available in workers, in particular in session fixtures, by requiring the pytestconfig fixture and accessing pytestconfig.cache.

Though the pytest cache use probably only works for xdist uses in the same machine (e.g by using the -n switch), rather than with more exotic uses of execnet, since it works by accessing the shared cache directory, unless pytest-xdist also syncs the pytest cache directory through execnet along with the tests themselves, which I haven't tested.

@leorochael
Copy link

leorochael commented Apr 23, 2021

By the way, there's also a pytest_sessionend(session, exitstatus):

def pytest_sessionfinish(session, exitstatus):
    workerinput = getattr(session.config, 'workerinput', None)
    if workerinput is None:
        print("\nExiting the Master Process, or running serially", file=sys.__stderr__)
    else:
        print(f"\nExiting Worker: {workerinput['workerid']}\n", file=sys.__stderr__)

Notice the use of sys.__stderr__ since sys.stderr is captured by this point.

This function is called once in each worker, strictly before it is called on the master process.

@kousu
Copy link

kousu commented Apr 25, 2021

@leorochael we tried that same workaround, but we found it was flakey. In rare cases, and more often on macOS, it runs multiple times: spinalcordtoolbox/spinalcordtoolbox#3071 (comment)

I don't fully understand why. It seems like it should work. But that's why I went looking for other solutions.

@kapilt thanks for the feedback! I got sidetracked with solving other problems. I haven't looked into the bug you brought up with my implementation, but I believe you. It's hard to get concurrency stuff right.

@leorochael
Copy link

leorochael commented Apr 25, 2021

@kousu, if understand correctly, your code does this:

def pytest_sessionstart(session):
    """ Download sct_testing_data prior to test collection. """
    logger.info("Downloading sct test data")
    downloader.main(['-d', 'sct_testing_data', '-o', sct_test_path()])

But that code is guaranteed to run multiple times, once alone in the master process, then multiple times in parallel, once for each worker.

If you want to guarantee that code to run only once, in the master process only, you must do:

def pytest_sessionstart(session):
    """ Download sct_testing_data prior to test collection. """
    if getattr(session.config, 'workerinput', None) is not None:
        # No need to download, the master process has already done that.
        return
    logger.info("Downloading sct test data")
    downloader.main(['-d', 'sct_testing_data', '-o', sct_test_path()])

Or if you want to save even more time between test runs on the same machine, check if the data isn't already downloaded before calling downloader.main(...)

def pytest_sessionstart(session):
    """ Download sct_testing_data prior to test collection. """
    if getattr(session.config, 'workerinput', None) is not None:
        # No need to download, the master process has already done that.
        return
    test_path = sct_test_path()
    if os.path.exists(test_path):
        logger.info("Already downloaded sct test data")
    logger.info("Downloading sct test data")
    downloader.main(['-d', 'sct_testing_data', '-o', test_path])

For other codebases that don't have their own infrastructure for locating where to save files, I recommend using session.config.cache.makedir(), then checking if the expected files are present inside the created directory, or using session.config.get() before calling session.config.set()

If you do that, you don't even need to check getattr(session.config, 'workerinput', None), as the master process, which is guaranteed to run pytest_sessionstart() alone before launching workers, will have already downloaded the data before pytest_sessionstart() is launched in each worker in parallel.

I've made this flowchart to make the sequence of steps more clear:

PyTest XDist Session Flowchart

@leorochael
Copy link

leorochael commented Apr 25, 2021

By the way, when running in each worker, the content of the session.config.workerinput dictionary is something like:

{
    'workerid': 'gw0',  # or gw1, gw2, etc...
    'workercount': 2,  # Or whatever you passed as argument to `pytest -n`
    'testrunuid': '2d1c(...)f350',
    'mainargv': ['/path/to/venv/bin/pytest', '-n', '2'],  # Or whatever your `pytest` cmdline was
}

So you can do things like create separated directory structures or restore dumps into separate databases, or do whatever you need to do so that your tests running in different workers don't step on each other's toes, by using session.config.workerinput['workerid'] to create different names for your structures.

Please feel free to use any of the above to enhance the docs for pytest-xdist.

ack-bot pushed a commit to aws-controllers-k8s/ec2-controller that referenced this issue Sep 8, 2021
Issue #, if available: [#489](aws-controllers-k8s/community#489)

Description of changes:
* adds route table resource
* refactors e2e tests to share common resources (i.e. VPC)
  * otherwise, vpcs would need to be created for each test which is inefficient and may exceed quota
  * now, only 2 vpcs are created per test run
  * could not use `@pytest.fixture(scope="session")` on vpc because we use [pytest-xdist](pytest-dev/pytest-xdist#271)
* adds route table tests

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
@ktbarrett
Copy link

Is there status on the implementation of the per-node and per-pytest-invocation scopes? I didn't seen anything linked in this thread, but perhaps something wasn't linked.

We are using pytest-xdist to run tests in parallel and want to share a single "build" step between those tests. The "session" scope doing a build per worker is fine, but it would be nice to instead do this only once per node to reduce disk usage and maybe a little time.

@nicoddemus
Copy link
Member

Hi @ktbarrett,

Is there status on the implementation of the per-node and per-pytest-invocation scopes? I

AFAIK nobody is working on it actively.

@leorochael
Copy link

@ktbarrett, the way to run a setup only once per session, instead of once per worker is to declare a pytest_sessionstart(session) and check that getattr(session.config, 'workerinput', None) is None, as I mentioned previously.

@ktbarrett
Copy link

@leorochael Thanks for the hint, but that's not a solution that I want to recommend to end users. Perhaps it's fine if I package that solution for them.

@nicoddemus It seems some support would need to be added to remote.py to run such fixtures, which doesn't seem that bad, the real problem seems to be registering a new scope with pytest. I didn't see that mentioned in the plugin documentation, do you have any idea? I am willing to work on this, but I would probably only work on the per-node scope.

@nicoddemus
Copy link
Member

nicoddemus commented Feb 15, 2022

I didn't see that mentioned in the plugin documentation, do you have any idea?

Currently it is not possible to add a new fixture scope via a plugin, I'm afraid.

I suggest instead you package your solution into a function/context manager of sorts, say a function which receives config and returns the fixture value, dealing with using a inter-process lock to ensure it is computed only once.

Say instead of users writing this:

@pytest.fixture(scope="session")
def value():
    return compute_value_slow()

As is, each worker will execute its own copy of compute_value_slow.

From the top of my head, I guess something like this can be implemented:

@pytest.fixture(scope="session")
def value():
    with ensure_unique_computation() as signal:
        if signal.compute:
            signal.value = compute_value_slow()
    return signal.value

ensure_unique_computation can then use an inter-process lock so:

  1. The first process entering the context manager returns that signal object with signal.compute == True and signal.value as None, and acquires an inter-process lock. This tells the caller that they should compute the fixture value and assign it to the signal.value object.
  2. The next processes entering the context manager, should wait on the inter-process lock.
  3. The first process, when exiting the context manager, now has the computed value in signal.value. It should serialize it to disk somewhere (probably next to the lock), and release the lock.
  4. The other processes, open getting the lock, should now be able to unserialize the object from disk, and assign them to their signal.value object.

Now that I think about it, it might even be something built into pytest-xdist itself, if somebody can make it work and provide a PR.

Also see: https://pytest-xdist.readthedocs.io/en/latest/how-to.html#making-session-scoped-fixtures-execute-only-once

@ktbarrett
Copy link

ktbarrett commented Feb 15, 2022

Seeing as the body of value would be repeated besides the body of the if signal.compute branch, how about a decorator instead?

@pytest.fixture(scope="session")
@xdist.per_node_fixture
def value():
    return compute_value_slow()

You could even merge the two decorators and provide a xdist.fixture that handles scope="node". My gut tells me that the reaction to that would be "no".

EDIT: That would be annoying to implement considering fixtures can use other fixtures. nvm.
EDIT: And a xdist.fixture supporting scope="node" would need to ensure fixture ordering like pytest does, which it can't really do without inserting itself into pytest's flow. Unfortunately pytest's fixture flow seems to be hardcoded and shotgunned across the codebase. So that isn't viable either. Guess the contextmanager is really the only possibility.

Now that I think about it, it might even be something built into pytest-xdist itself, if somebody can make it work and provide a PR.

Sure thing.

@RonnyPfannschmidt
Copy link
Member

currently there is no safe way to have "per pytest combined testrun fixtures"

in particular since its actually moretricky, as you bascially have multiple dimensions of interactions

  • hosts/nodes
  • users
  • just the normal parallelization

there are types of fixtures one would want to run once pe

right now im nto ware of a clearcut way to handle details on many of those

the common suggestions for now is to rely on parallelization and use locks/caches

while i'd love to see better coordination tools, i currently dont have a use-case for those myself, and not the spare cycles to attack this problem differently

@philippefutureboy
Copy link

philippefutureboy commented Jun 21, 2022

I implemented a simple solution if one needs to run a side effect once using a file.
Not the best, but if that can help someone, here goes:

@pytest.fixture(scope="session", autouse=True)
def before_all(tmp_path_factory):
    TMP_ROOT = tmp_path_factory.getbasetemp().parent
    LOCK_PATH = TMP_ROOT / 'before_all.lock'

    if os.environ.get('PYTEST_XDIST_WORKER') == 'gw0':
        shutil.rmtree("path/to/output/folder") # do your action / for me it's clearing out a non-temp folder
        open(LOCK_PATH, 'a').close()
    else:
        while not LOCK_PATH.is_file():
            time.sleep(0.1) # wait until the LOCK_PATH exists

If you need to clear up what you set up after your tests, you can write to the LOCK_PATH some sort of semaphore on exit, e.g. an integer. If the integer == the number of workers then you know you are in the last one before all the runs are done and you can then tear down the common infrastructure/state.
If you need to do this across a decentralized system, you could use a memcached or redis instance and save your semaphore as a key:value pair where the key is the key of the current run (e.g. current commit hash)

Hopefully that can help!

@nicoddemus
Copy link
Member

Thanks for sharing @philippefutureboy,

There's a documented solution similar to yours but which is more robust in https://pytest-xdist.readthedocs.io/en/latest/how-to.html#making-session-scoped-fixtures-execute-only-once.

@ofek
Copy link

ofek commented Jul 31, 2022

I hit this today. If it helps anyone, this is how I'm handling global setup/tear down of Docker containers https://github.com/pypa/hatch/blob/a9d36c75fc1f7785861772e37635212770f49ac7/tests/conftest.py#L182-L232

Essentially, I'm populating 2 global directories of worker IDs representing sessions that have started and sessions that have ended.

@maciejmatczak
Copy link

maciejmatczak commented Sep 13, 2022

I was thinking about different solution.

Is it possible to group session-fixture-dependent tests? Somewhat automatically is preferred. Examples of my scenarios below.

I have a heavy to setup fixtures, that I later on reuse. I would like to see a parallelization (test grouping) to happen exactly around this fixtures, but reusage happening synchronously. I am spinning up a multiple servers, with which I need to communicate.

Simple example:

import pytest
import time


@pytest.fixture(scope="session")
def heavy1():
    time.sleep(10)

    yield "heavy1"


@pytest.fixture(scope="session")
def heavy2():
    time.sleep(10)

    yield "heavy2"


def test_something(heavy1):
    assert heavy1 == "heavy"


def test_something_else(heavy1):
    assert heavy1 == "heavy1"


def test_different_stuff(heavy2):
    assert heavy2 == "heavy2"

I would like to see, when running with -n2, having 2 first tests in first worker and 3rd one in second worker. Because of that split, heavy1 fixture would be setup once, also only in first worker. Analogically for heavy2.

That would parallelize the tests execution where I need it parallelized.

My real case scenario fixtures are more like:

import pytest
import time

from foo import server


@pytest.fixture(scope="session", params=[
    "a",
    "b",
    "c",
])
def heavy(request):
    with server.start(param=request.param) as connection:
        yield connection

def test_something(heavy):
    assert heavy == "heavy"

def test_different_thing(heavy):
    assert heavy == "heavy1"

I parametrize fixture and I would like to group tests per effective fixture value... Like:

  • test_something[a], test_different_thing[a]
  • test_something[b, test_different_thing[b]
  • test_something[c], test_different_thing[c]

That way I don't need pickling, while my session scoped fixtures are executed once.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests