From 83bdbf4b95c914a889d1faa8fba8d506bcc2f8c7 Mon Sep 17 00:00:00 2001
From: Bruno Oliveira <nicoddemus@gmail.com>
Date: Mon, 29 Nov 2021 12:11:52 -0300
Subject: [PATCH] Revamp README

* Use a document title.
* Show a "short and sweet" section at the beginning highlighting the main usage of the plugin.
* Add a table of contents.
* Use a dedicated howto section at the end.
* Move the OVERVIEW section to the main README.
---
 OVERVIEW.md |  76 -------------
 README.rst  | 314 +++++++++++++++++++++++++++++++++++-----------------
 2 files changed, 210 insertions(+), 180 deletions(-)
 delete mode 100644 OVERVIEW.md

diff --git a/OVERVIEW.md b/OVERVIEW.md
deleted file mode 100644
index da0d3c4d..00000000
--- a/OVERVIEW.md
+++ /dev/null
@@ -1,76 +0,0 @@
-# Overview #
-
-`xdist` works by spawning one or more **workers**, which are controlled
-by the **controller**. Each **worker** is responsible for performing
-a full test collection and afterwards running tests as dictated by the **controller**.
-
-The execution flow is:
-
-1. **controller** spawns one or more **workers** at the beginning of
-   the test session. The communication between **controller** and **worker** nodes makes use of
-   [execnet](https://codespeak.net/execnet/) and its [gateways](https://codespeak.net/execnet/basics.html#gateways-bootstrapping-python-interpreters).
-   The actual interpreters executing the code for the **workers** might
-   be remote or local.
-
-1. Each **worker** itself is a mini pytest runner. **workers** at this
-   point perform a full test collection, sending back the collected
-   test-ids back to the **controller** which does not
-   perform any collection itself.
-
-1. The **controller** receives the result of the collection from all nodes.
-   At this point the **controller** performs some sanity check to ensure that
-   all **workers** collected the same tests (including order), bailing out otherwise.
-   If all is well, it converts the list of test-ids into a list of simple
-   indexes, where each index corresponds to the position of that test in the
-   original collection list. This works because all nodes have the same
-   collection list, and saves bandwidth because the **controller** can now tell
-   one of the workers to just *execute test index 3* index of passing the
-   full test id.
-
-1. If **dist-mode** is **each**: the **controller** just sends the full list
-   of test indexes to each node at this moment.
-
-1. If **dist-mode** is **load**: the **controller** takes around 25% of the
-   tests and sends them one by one to each **worker** in a round robin
-   fashion. The rest of the tests will be distributed later as **workers**
-   finish tests (see below).
-
-1. Note that `pytest_xdist_make_scheduler` hook can be used to implement custom tests distribution logic.
-
-1. **workers** re-implement `pytest_runtestloop`: pytest's default implementation
-   basically loops over all collected items in the `session` object and executes
-   the `pytest_runtest_protocol` for each test item, but in xdist **workers** sit idly
-   waiting for **controller** to send tests for execution. As tests are
-   received by **workers**, `pytest_runtest_protocol` is executed for each test.
-   Here it worth noting an implementation detail: **workers** always must keep at
-   least one test item on their queue due to how the `pytest_runtest_protocol(item, nextitem)`
-   hook is defined: in order to pass the `nextitem` to the hook, the worker must wait for more
-   instructions from controller before executing that remaining test. If it receives more tests,
-   then it can safely call `pytest_runtest_protocol` because it knows what the `nextitem` parameter will be.
-   If it receives a "shutdown" signal, then it can execute the hook passing `nextitem` as `None`.
-
-1. As tests are started and completed at the **workers**, the results are sent
-   back to the **controller**, which then just forwards the results to
-   the appropriate pytest hooks: `pytest_runtest_logstart` and
-   `pytest_runtest_logreport`. This way other plugins (for example `junitxml`)
-   can work normally. The **controller** (when in dist-mode **load**)
-   decides to send more tests to a node when a test completes, using
-   some heuristics such as test durations and how many tests each **worker**
-   still has to run.
-
-1. When the **controller** has no more pending tests it will
-   send a "shutdown" signal to all **workers**, which will then run their
-   remaining tests to completion and shut down. At this point the
-   **controller** will sit waiting for **workers** to shut down, still
-   processing events such as `pytest_runtest_logreport`.
-
-## FAQ ##
-
-> Why does each worker do its own collection, as opposed to having
-the controller collect once and distribute from that collection to the workers?
-
-If collection was performed by controller then it would have to
-serialize collected items to send them through the wire, as workers live in another process.
-The problem is that test items are not easily (impossible?) to serialize, as they contain references to
-the test functions, fixture managers, config objects, etc. Even if one manages to serialize it,
-it seems it would be very hard to get it right and easy to break by any small change in pytest.
diff --git a/README.rst b/README.rst
index c40c9f3e..41f315e9 100644
--- a/README.rst
+++ b/README.rst
@@ -1,4 +1,6 @@
-
+============
+pytest-xdist
+============
 
 .. image:: http://img.shields.io/pypi/v/pytest-xdist.svg
     :alt: PyPI version
@@ -17,36 +19,17 @@
 .. image:: https://img.shields.io/badge/code%20style-black-000000.svg
     :target: https://github.com/ambv/black
 
-xdist: pytest distributed testing plugin
-========================================
-
-The `pytest-xdist`_ plugin extends pytest with some unique
-test execution modes:
+The `pytest-xdist`_ plugin extends pytest with new test execution modes, the most used being distributing
+tests across multiple CPUs to speed up test execution::
 
-* test run parallelization_: if you have multiple CPUs or hosts you can use
-  those for a combined test run.  This allows to speed up
-  development or to use special resources of `remote machines`_.
-
-
-* ``--looponfail``: run your tests repeatedly in a subprocess.  After each run
-  pytest waits until a file in your project changes and then re-runs
-  the previously failing tests.  This is repeated until all tests pass
-  after which again a full run is performed.
-
-* `Multi-Platform`_ coverage: you can specify different Python interpreters
-  or different platforms and run tests in parallel on all of them.
-
-Before running tests remotely, ``pytest`` efficiently "rsyncs" your
-program source code to the remote place.  All test results
-are reported back and displayed to your local terminal.
-You may specify different Python versions and interpreters.
-
-If you would like to know how pytest-xdist works under the covers, checkout
-`OVERVIEW <https://github.com/pytest-dev/pytest-xdist/blob/master/OVERVIEW.md>`_.
+    pytest -n auto
 
+With this call, pytest will spawn a number of workers processes equal to the number of available CPUs, and distribute
+the tests randomly across them. There is also a number of `distribution modes`_ to choose from.
 
 **NOTE**: due to how pytest-xdist is implemented, the ``-s/--capture=no`` option does not work.
 
+.. contents:: **Table of Contents**
 
 Installation
 ------------
@@ -61,29 +44,47 @@ To use ``psutil`` for detection of the number of CPUs available, install the ``p
     pip install pytest-xdist[psutil]
 
 
+Features
+--------
+
+* Test run parallelization_: tests can be executed across  multiple CPUs or hosts.
+  This allows to speed up development or to use special resources of `remote machines`_.
+
+* ``--looponfail``: run your tests repeatedly in a subprocess.  After each run
+  pytest waits until a file in your project changes and then re-runs
+  the previously failing tests.  This is repeated until all tests pass
+  after which again a full run is performed.
+
+* `Multi-Platform`_ coverage: you can specify different Python interpreters
+  or different platforms and run tests in parallel on all of them.
+
+  Before running tests remotely, ``pytest`` efficiently "rsyncs" your
+  program source code to the remote place.
+  You may specify different Python versions and interpreters. It does not
+  installs/synchronize dependencies however.
+
+  **Note**: this mode exists mostly for backward compatibility, as modern development
+  relies on continuous integration for multi-platform testing.
+
 .. _parallelization:
 
-Speed up test runs by sending tests to multiple CPUs
-----------------------------------------------------
+Running tests across multiple CPUs
+----------------------------------
 
 To send tests to multiple CPUs, use the ``-n`` (or ``--numprocesses``) option::
 
-    pytest -n NUMCPUS
+    pytest -n 8
 
 Pass ``-n auto`` to use as many processes as your computer has CPU cores. This
 can lead to considerable speed ups, especially if your test suite takes a
 noticeable amount of time.
 
-If a test crashes a worker, pytest-xdist will automatically restart that worker
-and report the test’s failure. You can use the ``--max-worker-restart`` option
-to limit the number of worker restarts that are allowed, or disable restarting
-altogether using ``--max-worker-restart 0``.
+The test distribution algorithm is configured with the ``--dist`` command-line option:
 
-By default, using ``--numprocesses`` will send pending tests to any worker that
-is available, without any guaranteed order. You can change the test
-distribution algorithm this with the ``--dist`` option. It takes these values:
+.. _distribution modes:
 
-* ``--dist no``: The default algorithm, distributing one test at a time.
+* ``--dist load`` **(default)**: Sends pending tests to any worker that is
+  available, without any guaranteed order.
 
 * ``--dist loadscope``: Tests are grouped by **module** for *test functions*
   and by **class** for *test methods*. Groups are distributed to available
@@ -96,67 +97,31 @@ distribution algorithm this with the ``--dist`` option. It takes these values:
   distributed to available workers as whole units. This guarantees that all
   tests in a file run in the same worker.
 
-* ``--dist loadgroup``: Tests are grouped by xdist_group mark. Groups are
+* ``--dist loadgroup``: Tests are grouped by the ``xdist_group`` mark. Groups are
   distributed to available workers as whole units. This guarantees that all
-  tests with same xdist_group name run in the same worker.
-
-Making session-scoped fixtures execute only once
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-``pytest-xdist`` is designed so that each worker process will perform its own collection and execute
-a subset of all tests. This means that tests in different processes requesting a high-level
-scoped fixture (for example ``session``) will execute the fixture code more than once, which
-breaks expectations and might be undesired in certain situations.
-
-While ``pytest-xdist`` does not have a builtin support for ensuring a session-scoped fixture is
-executed exactly once, this can be achieved by using a lock file for inter-process communication.
-
-The example below needs to execute the fixture ``session_data`` only once (because it is
-resource intensive, or needs to execute only once to define configuration options, etc), so it makes
-use of a `FileLock <https://pypi.org/project/filelock/>`_ to produce the fixture data only once
-when the first process requests the fixture, while the other processes will then read
-the data from a file.
-
-Here is the code:
-
-.. code-block:: python
-
-    import json
+  tests with same ``xdist_group`` name run in the same worker.
 
-    import pytest
-    from filelock import FileLock
+  .. code-block:: python
 
+      @pytest.mark.xdist_group(name="group1")
+      def test1():
+          pass
 
-    @pytest.fixture(scope="session")
-    def session_data(tmp_path_factory, worker_id):
-        if worker_id == "master":
-            # not executing in with multiple workers, just produce the data and let
-            # pytest's fixture caching do its job
-            return produce_expensive_data()
-
-        # get the temp directory shared by all workers
-        root_tmp_dir = tmp_path_factory.getbasetemp().parent
-
-        fn = root_tmp_dir / "data.json"
-        with FileLock(str(fn) + ".lock"):
-            if fn.is_file():
-                data = json.loads(fn.read_text())
-            else:
-                data = produce_expensive_data()
-                fn.write_text(json.dumps(data))
-        return data
+      class TestA:
+          @pytest.mark.xdist_group("group1")
+          def test2():
+              pass
 
+  This will make sure ``test1`` and ``TestA::test2`` will run in the same worker.
+  Tests without the ``xdist_group`` mark are distributed normally as in the ``--dist=load`` mode.
 
-The example above can also be use in cases a fixture needs to execute exactly once per test session, like
-initializing a database service and populating initial tables.
+* ``--dist no``: The normal pytest execution mode, runs one test at a time (no distribution at all).
 
-This technique might not work for every case, but should be a starting point for many situations
-where executing a high-scope fixture exactly once is important.
 
 Running tests in a Python subprocess
 ------------------------------------
 
-To instantiate a python3.9 subprocess and send tests to it, you may type::
+To instantiate a ``python3.9`` subprocess and send tests to it, you may type::
 
     pytest -d --tx popen//python=python3.9
 
@@ -253,8 +218,21 @@ at once. The specifications strings use the `xspec syntax`_.
 
 .. _`execnet`: https://codespeak.net/execnet
 
+
+When tests crash
+----------------
+
+If a test crashes a worker, pytest-xdist will automatically restart that worker
+and report the test’s failure. You can use the ``--max-worker-restart`` option
+to limit the number of worker restarts that are allowed, or disable restarting
+altogether using ``--max-worker-restart 0``.
+
+
+How-tos
+-------
+
 Identifying the worker process during a test
---------------------------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 *New in version 1.15.*
 
@@ -315,7 +293,7 @@ Since version 2.0, the following functions are also available in the ``xdist`` m
 
 
 Identifying workers from the system environment
------------------------------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 *New in version 2.4*
 
@@ -335,7 +313,7 @@ external scripts.
 
 
 Uniquely identifying the current test run
------------------------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 *New in version 1.32.*
 
@@ -369,14 +347,14 @@ Additionally, during a test run, the following environment variable is defined:
 * ``PYTEST_XDIST_TESTRUNUID``: the unique id of the test run.
 
 Accessing ``sys.argv`` from the controller node in workers
-----------------------------------------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 To access the ``sys.argv`` passed to the command-line of the controller node, use
 ``request.config.workerinput["mainargv"]``.
 
 
 Specifying test exec environments in an ini file
-------------------------------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 You can use pytest's ini file configuration to avoid typing common options.
 You can for example make running with three subprocesses your default like this:
@@ -401,7 +379,7 @@ to run tests in each of the environments.
 
 
 Specifying "rsync" dirs in an ini-file
---------------------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 In a ``tox.ini`` or ``setup.cfg`` file in your root project directory
 you may specify directories to include or to exclude in synchronisation:
@@ -419,20 +397,148 @@ where the configuration file was found.
 .. _`pytest-xdist repository`: https://github.com/pytest-dev/pytest-xdist
 .. _`pytest`: http://pytest.org
 
-Groups tests by xdist_group mark
----------------------------------
 
-*New in version 2.4.*
+Making session-scoped fixtures execute only once
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``pytest-xdist`` is designed so that each worker process will perform its own collection and execute
+a subset of all tests. This means that tests in different processes requesting a high-level
+scoped fixture (for example ``session``) will execute the fixture code more than once, which
+breaks expectations and might be undesired in certain situations.
+
+While ``pytest-xdist`` does not have a builtin support for ensuring a session-scoped fixture is
+executed exactly once, this can be achieved by using a lock file for inter-process communication.
+
+The example below needs to execute the fixture ``session_data`` only once (because it is
+resource intensive, or needs to execute only once to define configuration options, etc), so it makes
+use of a `FileLock <https://pypi.org/project/filelock/>`_ to produce the fixture data only once
+when the first process requests the fixture, while the other processes will then read
+the data from a file.
 
-Two or more tests belonging to different classes or modules can be executed in same worker through the xdist_group marker:
+Here is the code:
 
 .. code-block:: python
 
-    @pytest.mark.xdist_group(name="group1")
-    def test1():
-        pass
+    import json
+
+    import pytest
+    from filelock import FileLock
+
+
+    @pytest.fixture(scope="session")
+    def session_data(tmp_path_factory, worker_id):
+        if worker_id == "master":
+            # not executing in with multiple workers, just produce the data and let
+            # pytest's fixture caching do its job
+            return produce_expensive_data()
+
+        # get the temp directory shared by all workers
+        root_tmp_dir = tmp_path_factory.getbasetemp().parent
+
+        fn = root_tmp_dir / "data.json"
+        with FileLock(str(fn) + ".lock"):
+            if fn.is_file():
+                data = json.loads(fn.read_text())
+            else:
+                data = produce_expensive_data()
+                fn.write_text(json.dumps(data))
+        return data
+
+
+The example above can also be use in cases a fixture needs to execute exactly once per test session, like
+initializing a database service and populating initial tables.
+
+This technique might not work for every case, but should be a starting point for many situations
+where executing a high-scope fixture exactly once is important.
+
 
-    class TestA:
-        @pytest.mark.xdist_group("group1")
-        def test2():
-            pass
+How does xdist work?
+--------------------
+
+``xdist`` works by spawning one or more **workers**, which are
+controlled by the **controller**. Each **worker** is responsible for
+performing a full test collection and afterwards running tests as
+dictated by the **controller**.
+
+The execution flow is:
+
+1. **controller** spawns one or more **workers** at the beginning of the
+   test session. The communication between **controller** and **worker**
+   nodes makes use of `execnet <https://codespeak.net/execnet/>`__ and
+   its
+   `gateways <https://codespeak.net/execnet/basics.html#gateways-bootstrapping-python-interpreters>`__.
+   The actual interpreters executing the code for the **workers** might
+   be remote or local.
+
+2. Each **worker** itself is a mini pytest runner. **workers** at this
+   point perform a full test collection, sending back the collected
+   test-ids back to the **controller** which does not perform any
+   collection itself.
+
+3. The **controller** receives the result of the collection from all
+   nodes. At this point the **controller** performs some sanity check to
+   ensure that all **workers** collected the same tests (including
+   order), bailing out otherwise. If all is well, it converts the list
+   of test-ids into a list of simple indexes, where each index
+   corresponds to the position of that test in the original collection
+   list. This works because all nodes have the same collection list, and
+   saves bandwidth because the **controller** can now tell one of the
+   workers to just *execute test index 3* index of passing the full test
+   id.
+
+4. If **dist-mode** is **each**: the **controller** just sends the full
+   list of test indexes to each node at this moment.
+
+5. If **dist-mode** is **load**: the **controller** takes around 25% of
+   the tests and sends them one by one to each **worker** in a round
+   robin fashion. The rest of the tests will be distributed later as
+   **workers** finish tests (see below).
+
+6. Note that ``pytest_xdist_make_scheduler`` hook can be used to
+   implement custom tests distribution logic.
+
+7. **workers** re-implement ``pytest_runtestloop``: pytest’s default
+   implementation basically loops over all collected items in the
+   ``session`` object and executes the ``pytest_runtest_protocol`` for
+   each test item, but in xdist **workers** sit idly waiting for
+   **controller** to send tests for execution. As tests are received by
+   **workers**, ``pytest_runtest_protocol`` is executed for each test.
+   Here it worth noting an implementation detail: **workers** always
+   must keep at least one test item on their queue due to how the
+   ``pytest_runtest_protocol(item, nextitem)`` hook is defined: in order
+   to pass the ``nextitem`` to the hook, the worker must wait for more
+   instructions from controller before executing that remaining test. If
+   it receives more tests, then it can safely call
+   ``pytest_runtest_protocol`` because it knows what the ``nextitem``
+   parameter will be. If it receives a “shutdown” signal, then it can
+   execute the hook passing ``nextitem`` as ``None``.
+
+8. As tests are started and completed at the **workers**, the results
+   are sent back to the **controller**, which then just forwards the
+   results to the appropriate pytest hooks: ``pytest_runtest_logstart``
+   and ``pytest_runtest_logreport``. This way other plugins (for example
+   ``junitxml``) can work normally. The **controller** (when in
+   dist-mode **load**) decides to send more tests to a node when a test
+   completes, using some heuristics such as test durations and how many
+   tests each **worker** still has to run.
+
+9. When the **controller** has no more pending tests it will send a
+   “shutdown” signal to all **workers**, which will then run their
+   remaining tests to completion and shut down. At this point the
+   **controller** will sit waiting for **workers** to shut down, still
+   processing events such as ``pytest_runtest_logreport``.
+
+FAQ
+---
+
+**Question**: Why does each worker do its own collection, as opposed to having the
+controller collect once and distribute from that collection to the
+workers?
+
+If collection was performed by controller then it would have to
+serialize collected items to send them through the wire, as workers live
+in another process. The problem is that test items are not easily
+(impossible?) to serialize, as they contain references to the test
+functions, fixture managers, config objects, etc. Even if one manages to
+serialize it, it seems it would be very hard to get it right and easy to
+break by any small change in pytest.