From 83bdbf4b95c914a889d1faa8fba8d506bcc2f8c7 Mon Sep 17 00:00:00 2001 From: Bruno Oliveira Date: Mon, 29 Nov 2021 12:11:52 -0300 Subject: [PATCH] Revamp README * Use a document title. * Show a "short and sweet" section at the beginning highlighting the main usage of the plugin. * Add a table of contents. * Use a dedicated howto section at the end. * Move the OVERVIEW section to the main README. --- OVERVIEW.md | 76 ------------- README.rst | 314 +++++++++++++++++++++++++++++++++++----------------- 2 files changed, 210 insertions(+), 180 deletions(-) delete mode 100644 OVERVIEW.md diff --git a/OVERVIEW.md b/OVERVIEW.md deleted file mode 100644 index da0d3c4d..00000000 --- a/OVERVIEW.md +++ /dev/null @@ -1,76 +0,0 @@ -# Overview # - -`xdist` works by spawning one or more **workers**, which are controlled -by the **controller**. Each **worker** is responsible for performing -a full test collection and afterwards running tests as dictated by the **controller**. - -The execution flow is: - -1. **controller** spawns one or more **workers** at the beginning of - the test session. The communication between **controller** and **worker** nodes makes use of - [execnet](https://codespeak.net/execnet/) and its [gateways](https://codespeak.net/execnet/basics.html#gateways-bootstrapping-python-interpreters). - The actual interpreters executing the code for the **workers** might - be remote or local. - -1. Each **worker** itself is a mini pytest runner. **workers** at this - point perform a full test collection, sending back the collected - test-ids back to the **controller** which does not - perform any collection itself. - -1. The **controller** receives the result of the collection from all nodes. - At this point the **controller** performs some sanity check to ensure that - all **workers** collected the same tests (including order), bailing out otherwise. - If all is well, it converts the list of test-ids into a list of simple - indexes, where each index corresponds to the position of that test in the - original collection list. This works because all nodes have the same - collection list, and saves bandwidth because the **controller** can now tell - one of the workers to just *execute test index 3* index of passing the - full test id. - -1. If **dist-mode** is **each**: the **controller** just sends the full list - of test indexes to each node at this moment. - -1. If **dist-mode** is **load**: the **controller** takes around 25% of the - tests and sends them one by one to each **worker** in a round robin - fashion. The rest of the tests will be distributed later as **workers** - finish tests (see below). - -1. Note that `pytest_xdist_make_scheduler` hook can be used to implement custom tests distribution logic. - -1. **workers** re-implement `pytest_runtestloop`: pytest's default implementation - basically loops over all collected items in the `session` object and executes - the `pytest_runtest_protocol` for each test item, but in xdist **workers** sit idly - waiting for **controller** to send tests for execution. As tests are - received by **workers**, `pytest_runtest_protocol` is executed for each test. - Here it worth noting an implementation detail: **workers** always must keep at - least one test item on their queue due to how the `pytest_runtest_protocol(item, nextitem)` - hook is defined: in order to pass the `nextitem` to the hook, the worker must wait for more - instructions from controller before executing that remaining test. If it receives more tests, - then it can safely call `pytest_runtest_protocol` because it knows what the `nextitem` parameter will be. - If it receives a "shutdown" signal, then it can execute the hook passing `nextitem` as `None`. - -1. As tests are started and completed at the **workers**, the results are sent - back to the **controller**, which then just forwards the results to - the appropriate pytest hooks: `pytest_runtest_logstart` and - `pytest_runtest_logreport`. This way other plugins (for example `junitxml`) - can work normally. The **controller** (when in dist-mode **load**) - decides to send more tests to a node when a test completes, using - some heuristics such as test durations and how many tests each **worker** - still has to run. - -1. When the **controller** has no more pending tests it will - send a "shutdown" signal to all **workers**, which will then run their - remaining tests to completion and shut down. At this point the - **controller** will sit waiting for **workers** to shut down, still - processing events such as `pytest_runtest_logreport`. - -## FAQ ## - -> Why does each worker do its own collection, as opposed to having -the controller collect once and distribute from that collection to the workers? - -If collection was performed by controller then it would have to -serialize collected items to send them through the wire, as workers live in another process. -The problem is that test items are not easily (impossible?) to serialize, as they contain references to -the test functions, fixture managers, config objects, etc. Even if one manages to serialize it, -it seems it would be very hard to get it right and easy to break by any small change in pytest. diff --git a/README.rst b/README.rst index c40c9f3e..41f315e9 100644 --- a/README.rst +++ b/README.rst @@ -1,4 +1,6 @@ - +============ +pytest-xdist +============ .. image:: http://img.shields.io/pypi/v/pytest-xdist.svg :alt: PyPI version @@ -17,36 +19,17 @@ .. image:: https://img.shields.io/badge/code%20style-black-000000.svg :target: https://github.com/ambv/black -xdist: pytest distributed testing plugin -======================================== - -The `pytest-xdist`_ plugin extends pytest with some unique -test execution modes: +The `pytest-xdist`_ plugin extends pytest with new test execution modes, the most used being distributing +tests across multiple CPUs to speed up test execution:: -* test run parallelization_: if you have multiple CPUs or hosts you can use - those for a combined test run. This allows to speed up - development or to use special resources of `remote machines`_. - - -* ``--looponfail``: run your tests repeatedly in a subprocess. After each run - pytest waits until a file in your project changes and then re-runs - the previously failing tests. This is repeated until all tests pass - after which again a full run is performed. - -* `Multi-Platform`_ coverage: you can specify different Python interpreters - or different platforms and run tests in parallel on all of them. - -Before running tests remotely, ``pytest`` efficiently "rsyncs" your -program source code to the remote place. All test results -are reported back and displayed to your local terminal. -You may specify different Python versions and interpreters. - -If you would like to know how pytest-xdist works under the covers, checkout -`OVERVIEW `_. + pytest -n auto +With this call, pytest will spawn a number of workers processes equal to the number of available CPUs, and distribute +the tests randomly across them. There is also a number of `distribution modes`_ to choose from. **NOTE**: due to how pytest-xdist is implemented, the ``-s/--capture=no`` option does not work. +.. contents:: **Table of Contents** Installation ------------ @@ -61,29 +44,47 @@ To use ``psutil`` for detection of the number of CPUs available, install the ``p pip install pytest-xdist[psutil] +Features +-------- + +* Test run parallelization_: tests can be executed across multiple CPUs or hosts. + This allows to speed up development or to use special resources of `remote machines`_. + +* ``--looponfail``: run your tests repeatedly in a subprocess. After each run + pytest waits until a file in your project changes and then re-runs + the previously failing tests. This is repeated until all tests pass + after which again a full run is performed. + +* `Multi-Platform`_ coverage: you can specify different Python interpreters + or different platforms and run tests in parallel on all of them. + + Before running tests remotely, ``pytest`` efficiently "rsyncs" your + program source code to the remote place. + You may specify different Python versions and interpreters. It does not + installs/synchronize dependencies however. + + **Note**: this mode exists mostly for backward compatibility, as modern development + relies on continuous integration for multi-platform testing. + .. _parallelization: -Speed up test runs by sending tests to multiple CPUs ----------------------------------------------------- +Running tests across multiple CPUs +---------------------------------- To send tests to multiple CPUs, use the ``-n`` (or ``--numprocesses``) option:: - pytest -n NUMCPUS + pytest -n 8 Pass ``-n auto`` to use as many processes as your computer has CPU cores. This can lead to considerable speed ups, especially if your test suite takes a noticeable amount of time. -If a test crashes a worker, pytest-xdist will automatically restart that worker -and report the test’s failure. You can use the ``--max-worker-restart`` option -to limit the number of worker restarts that are allowed, or disable restarting -altogether using ``--max-worker-restart 0``. +The test distribution algorithm is configured with the ``--dist`` command-line option: -By default, using ``--numprocesses`` will send pending tests to any worker that -is available, without any guaranteed order. You can change the test -distribution algorithm this with the ``--dist`` option. It takes these values: +.. _distribution modes: -* ``--dist no``: The default algorithm, distributing one test at a time. +* ``--dist load`` **(default)**: Sends pending tests to any worker that is + available, without any guaranteed order. * ``--dist loadscope``: Tests are grouped by **module** for *test functions* and by **class** for *test methods*. Groups are distributed to available @@ -96,67 +97,31 @@ distribution algorithm this with the ``--dist`` option. It takes these values: distributed to available workers as whole units. This guarantees that all tests in a file run in the same worker. -* ``--dist loadgroup``: Tests are grouped by xdist_group mark. Groups are +* ``--dist loadgroup``: Tests are grouped by the ``xdist_group`` mark. Groups are distributed to available workers as whole units. This guarantees that all - tests with same xdist_group name run in the same worker. - -Making session-scoped fixtures execute only once -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -``pytest-xdist`` is designed so that each worker process will perform its own collection and execute -a subset of all tests. This means that tests in different processes requesting a high-level -scoped fixture (for example ``session``) will execute the fixture code more than once, which -breaks expectations and might be undesired in certain situations. - -While ``pytest-xdist`` does not have a builtin support for ensuring a session-scoped fixture is -executed exactly once, this can be achieved by using a lock file for inter-process communication. - -The example below needs to execute the fixture ``session_data`` only once (because it is -resource intensive, or needs to execute only once to define configuration options, etc), so it makes -use of a `FileLock `_ to produce the fixture data only once -when the first process requests the fixture, while the other processes will then read -the data from a file. - -Here is the code: - -.. code-block:: python - - import json + tests with same ``xdist_group`` name run in the same worker. - import pytest - from filelock import FileLock + .. code-block:: python + @pytest.mark.xdist_group(name="group1") + def test1(): + pass - @pytest.fixture(scope="session") - def session_data(tmp_path_factory, worker_id): - if worker_id == "master": - # not executing in with multiple workers, just produce the data and let - # pytest's fixture caching do its job - return produce_expensive_data() - - # get the temp directory shared by all workers - root_tmp_dir = tmp_path_factory.getbasetemp().parent - - fn = root_tmp_dir / "data.json" - with FileLock(str(fn) + ".lock"): - if fn.is_file(): - data = json.loads(fn.read_text()) - else: - data = produce_expensive_data() - fn.write_text(json.dumps(data)) - return data + class TestA: + @pytest.mark.xdist_group("group1") + def test2(): + pass + This will make sure ``test1`` and ``TestA::test2`` will run in the same worker. + Tests without the ``xdist_group`` mark are distributed normally as in the ``--dist=load`` mode. -The example above can also be use in cases a fixture needs to execute exactly once per test session, like -initializing a database service and populating initial tables. +* ``--dist no``: The normal pytest execution mode, runs one test at a time (no distribution at all). -This technique might not work for every case, but should be a starting point for many situations -where executing a high-scope fixture exactly once is important. Running tests in a Python subprocess ------------------------------------ -To instantiate a python3.9 subprocess and send tests to it, you may type:: +To instantiate a ``python3.9`` subprocess and send tests to it, you may type:: pytest -d --tx popen//python=python3.9 @@ -253,8 +218,21 @@ at once. The specifications strings use the `xspec syntax`_. .. _`execnet`: https://codespeak.net/execnet + +When tests crash +---------------- + +If a test crashes a worker, pytest-xdist will automatically restart that worker +and report the test’s failure. You can use the ``--max-worker-restart`` option +to limit the number of worker restarts that are allowed, or disable restarting +altogether using ``--max-worker-restart 0``. + + +How-tos +------- + Identifying the worker process during a test --------------------------------------------- +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ *New in version 1.15.* @@ -315,7 +293,7 @@ Since version 2.0, the following functions are also available in the ``xdist`` m Identifying workers from the system environment ------------------------------------------------ +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ *New in version 2.4* @@ -335,7 +313,7 @@ external scripts. Uniquely identifying the current test run ------------------------------------------ +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ *New in version 1.32.* @@ -369,14 +347,14 @@ Additionally, during a test run, the following environment variable is defined: * ``PYTEST_XDIST_TESTRUNUID``: the unique id of the test run. Accessing ``sys.argv`` from the controller node in workers ----------------------------------------------------------- +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ To access the ``sys.argv`` passed to the command-line of the controller node, use ``request.config.workerinput["mainargv"]``. Specifying test exec environments in an ini file ------------------------------------------------- +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ You can use pytest's ini file configuration to avoid typing common options. You can for example make running with three subprocesses your default like this: @@ -401,7 +379,7 @@ to run tests in each of the environments. Specifying "rsync" dirs in an ini-file --------------------------------------- +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ In a ``tox.ini`` or ``setup.cfg`` file in your root project directory you may specify directories to include or to exclude in synchronisation: @@ -419,20 +397,148 @@ where the configuration file was found. .. _`pytest-xdist repository`: https://github.com/pytest-dev/pytest-xdist .. _`pytest`: http://pytest.org -Groups tests by xdist_group mark ---------------------------------- -*New in version 2.4.* +Making session-scoped fixtures execute only once +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +``pytest-xdist`` is designed so that each worker process will perform its own collection and execute +a subset of all tests. This means that tests in different processes requesting a high-level +scoped fixture (for example ``session``) will execute the fixture code more than once, which +breaks expectations and might be undesired in certain situations. + +While ``pytest-xdist`` does not have a builtin support for ensuring a session-scoped fixture is +executed exactly once, this can be achieved by using a lock file for inter-process communication. + +The example below needs to execute the fixture ``session_data`` only once (because it is +resource intensive, or needs to execute only once to define configuration options, etc), so it makes +use of a `FileLock `_ to produce the fixture data only once +when the first process requests the fixture, while the other processes will then read +the data from a file. -Two or more tests belonging to different classes or modules can be executed in same worker through the xdist_group marker: +Here is the code: .. code-block:: python - @pytest.mark.xdist_group(name="group1") - def test1(): - pass + import json + + import pytest + from filelock import FileLock + + + @pytest.fixture(scope="session") + def session_data(tmp_path_factory, worker_id): + if worker_id == "master": + # not executing in with multiple workers, just produce the data and let + # pytest's fixture caching do its job + return produce_expensive_data() + + # get the temp directory shared by all workers + root_tmp_dir = tmp_path_factory.getbasetemp().parent + + fn = root_tmp_dir / "data.json" + with FileLock(str(fn) + ".lock"): + if fn.is_file(): + data = json.loads(fn.read_text()) + else: + data = produce_expensive_data() + fn.write_text(json.dumps(data)) + return data + + +The example above can also be use in cases a fixture needs to execute exactly once per test session, like +initializing a database service and populating initial tables. + +This technique might not work for every case, but should be a starting point for many situations +where executing a high-scope fixture exactly once is important. + - class TestA: - @pytest.mark.xdist_group("group1") - def test2(): - pass +How does xdist work? +-------------------- + +``xdist`` works by spawning one or more **workers**, which are +controlled by the **controller**. Each **worker** is responsible for +performing a full test collection and afterwards running tests as +dictated by the **controller**. + +The execution flow is: + +1. **controller** spawns one or more **workers** at the beginning of the + test session. The communication between **controller** and **worker** + nodes makes use of `execnet `__ and + its + `gateways `__. + The actual interpreters executing the code for the **workers** might + be remote or local. + +2. Each **worker** itself is a mini pytest runner. **workers** at this + point perform a full test collection, sending back the collected + test-ids back to the **controller** which does not perform any + collection itself. + +3. The **controller** receives the result of the collection from all + nodes. At this point the **controller** performs some sanity check to + ensure that all **workers** collected the same tests (including + order), bailing out otherwise. If all is well, it converts the list + of test-ids into a list of simple indexes, where each index + corresponds to the position of that test in the original collection + list. This works because all nodes have the same collection list, and + saves bandwidth because the **controller** can now tell one of the + workers to just *execute test index 3* index of passing the full test + id. + +4. If **dist-mode** is **each**: the **controller** just sends the full + list of test indexes to each node at this moment. + +5. If **dist-mode** is **load**: the **controller** takes around 25% of + the tests and sends them one by one to each **worker** in a round + robin fashion. The rest of the tests will be distributed later as + **workers** finish tests (see below). + +6. Note that ``pytest_xdist_make_scheduler`` hook can be used to + implement custom tests distribution logic. + +7. **workers** re-implement ``pytest_runtestloop``: pytest’s default + implementation basically loops over all collected items in the + ``session`` object and executes the ``pytest_runtest_protocol`` for + each test item, but in xdist **workers** sit idly waiting for + **controller** to send tests for execution. As tests are received by + **workers**, ``pytest_runtest_protocol`` is executed for each test. + Here it worth noting an implementation detail: **workers** always + must keep at least one test item on their queue due to how the + ``pytest_runtest_protocol(item, nextitem)`` hook is defined: in order + to pass the ``nextitem`` to the hook, the worker must wait for more + instructions from controller before executing that remaining test. If + it receives more tests, then it can safely call + ``pytest_runtest_protocol`` because it knows what the ``nextitem`` + parameter will be. If it receives a “shutdown” signal, then it can + execute the hook passing ``nextitem`` as ``None``. + +8. As tests are started and completed at the **workers**, the results + are sent back to the **controller**, which then just forwards the + results to the appropriate pytest hooks: ``pytest_runtest_logstart`` + and ``pytest_runtest_logreport``. This way other plugins (for example + ``junitxml``) can work normally. The **controller** (when in + dist-mode **load**) decides to send more tests to a node when a test + completes, using some heuristics such as test durations and how many + tests each **worker** still has to run. + +9. When the **controller** has no more pending tests it will send a + “shutdown” signal to all **workers**, which will then run their + remaining tests to completion and shut down. At this point the + **controller** will sit waiting for **workers** to shut down, still + processing events such as ``pytest_runtest_logreport``. + +FAQ +--- + +**Question**: Why does each worker do its own collection, as opposed to having the +controller collect once and distribute from that collection to the +workers? + +If collection was performed by controller then it would have to +serialize collected items to send them through the wire, as workers live +in another process. The problem is that test items are not easily +(impossible?) to serialize, as they contain references to the test +functions, fixture managers, config objects, etc. Even if one manages to +serialize it, it seems it would be very hard to get it right and easy to +break by any small change in pytest.