Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom test grouping and test group order logic #500

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

SalmonMode
Copy link

@SalmonMode SalmonMode commented Jan 19, 2020

Closes #18

It's been long requested (#18) to allow for custom logic for the distribution of tests among test groups in a simple and clean way. This adds that functionality by providing a hook that each tests nodeid is passed to, and whatever is returned by the hook for a given test is that test's test group name. All tests with the same test group name will be batched together and passed to the same worker together.

I also added another hook that grants access to the finished collection of test groups so that the order of the groups can be changed as needed. This allows for certain optimizations to be made, such as putting slow tests in the same group and placing that group in the beginning of the collection in order to speed the test suite up.

I also took the opportunity to clean up the logic quite a bit as there was a lot of repetition between the different load schedulers. Now they are unified into the same base class (except for EachScheduling) with the only differing logic being what their default grouping logic is. I felt I should leave EachScheduling alone for now, but I imagine it could also use the same base class by simply having every test in the same group, and then duplicating that group N times. But that seemed kinda memory heavy.

Thanks for submitting a PR, your contribution is really appreciated!

Here's a quick checklist that should be present in PRs:

  • [x ] Make sure to include reasonable tests for your change if necessary

  • [x ] We use towncrier for changelog management, so please add a news file into the changelog folder following these guidelines:

    • Name it $issue_id.$type for example 588.bugfix;

    • If you don't have an issue_id change it to the PR id after creating it

    • Ensure type is one of removal, feature, bugfix, vendor, doc or trivial

    • Make sure to use full sentences with correct case and punctuation, for example:

      Fix issue with non-ascii contents in doctest text files.
      

@nicoddemus
Copy link
Member

Hi @SalmonMode,

Thanks a lot for the PR! As you mention this is a long-requested feature so all the work you have put into it is more than welcome!

I like the approach of providing "low level" hooks that lets users and plugins customize test grouping to be sent to workers. Also so far in the review I'm very satisfied with the code quality and documentation you've put into it. 👍

I'm writing this just so to let you know that it will probably take awhile to review a PR of this magnitude, so please be patient! 😁

@SalmonMode
Copy link
Author

@nicoddemus haha I appreciate it 😁 . Take your time. I spent a very long time on it (granted, at least half of that time was on what to name the hooks).

As a side note, the name I chose was because I figure at some point in the future, this may be expanded to allow grouping based on markers, rather than just names, but I couldn't figure out how to pull those in in a good way, and wanted to get the PR in rather than holding onto it for longer.

@ssbarnea
Copy link
Member

Any chance to refresh this one?

@nicoddemus
Copy link
Member

Any chance to refresh this one?

Thanks for the ping @ssbarnea! Indeed I had lost sight of this.

Will make time for completing my review this week. 👍

@skhomuti
Copy link

@nicoddemus just reminding about the existence of this PR :)

@Wilsontomass
Copy link

Heya @nicoddemus, sorry for the bump but we would be extremely grateful for a PR of this kind to be added to pytest-xdist. Is it still feasable? Has some other functionality superseded it?

Copy link
Member

@RonnyPfannschmidt RonnyPfannschmidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite large, so itll be a while before I can take a deeper look

I want to note that it's unfortunate we don't have a good way to transfer details about marks with the nodeids, we should add a hook that allows to inform the coordinator about affinities (session scope fixtures, certain configurations)

@nicoddemus
Copy link
Member

Thanks a lot for the ping!

I had left a bunch of comments, but forgot to publish them 🤦 :

image

I just published them, I will give it another review later.


By default, there is no grouping logic and every individual test is placed in its own
test group, so using the ``-n`` option will send pending tests to any worker that is
available, without any guaranteed order. It should be assumed that when using this
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit misleading in the sense that it suggests every test will run in isolation with their own copies of every fixture (including high-scoped fixtures like session), which is not true: it is just that each worker is its own "session", so high-scoped fixtures will live in that session as if in an isolated pytest executing.

We could rewrite that part, but just removing it altogether is an option too. What do you think?

grouping options, based on simple criteria about a test's nodeid. so you can gunarantee
that certain tests are run in the same process. When they're run in the same process,
you gunarantee that larger-scoped fixtures are only executed as many times as would
normally be expected for the tests in the test group. But, once that test group is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The phrase "But, once that test group..." suggests that xdist might be do something special in the sense to destroy the fixtures...

Perhaps we should have a separate section explaining how fixture execution in general works in xdist: each worker is its own session, so high-scope fixtures are bound to that worker, etc. This section applies to xdist in general and is not specific to the test grouping feature.

Back to the docs at hand, we can then just discuss how tests are grouped/sent to workers, without getting into details again regarding fixture setup/teardown.

What do you think? Hope my comments make sense. 😁

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

By default, ``pytest-xdist`` doesn't group any tests together, but it provides some
grouping options, based on simple criteria about a test's nodeid. so you can gunarantee
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
grouping options, based on simple criteria about a test's nodeid. so you can gunarantee
grouping options, based on simple criteria about a test's nodeid, so you can guarantee

groups, as it creates a new groups as needed. You can tap into this system to define
your own grouping logic by using the ``pytest_xdist_set_test_group_from_nodeid``.

If you define your own copy of that hook, it will be called once for every test, and the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If you define your own copy of that hook, it will be called once for every test, and the
If you define your own implementation of that hook, it will be called once for every test, and the

::

workqueue = {
'<full>/<path>/<to>/test_module.py': {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'<full>/<path>/<to>/test_module.py': {
'<group id 1>': {

Perhaps use <group id 1> to illustrate the idea that a group id is anything, not necessarily a filename?

should always be identical. This is an alias for
`.registered_collections``.

:node2pending: Map of nodes and the names of their pending test groups. The
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably best to move this to the properties' docstring?


@property
def collection_is_completed(self):
"""Boolean indication initial test collection is complete.
"""Booleanq indication initial test collection is complete.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Booleanq indication initial test collection is complete.
"""Boolean indication initial test collection is complete.

@nicoddemus
Copy link
Member

nicoddemus commented Aug 30, 2022

we should add a hook that allows to inform the coordinator about affinities (session scope fixtures, certain configurations)

Just so we are in the same page, you mean in this PR or in a follow up?

Copy link
Member

@nicoddemus nicoddemus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the implementation and it looks good, left a few more comments.

I personally would like to have some of the current code that manipulates workloads/test groups abstracted away into separate classes, instead of having all the details implemented into the scope itself, but definitely this can be done later if someone decides to tackle it.

def pytest_xdist_order_test_groups(workqueue):
"""Sort the queue of test groups to determine the order they will be executed in.

The ``workqueue`` is an ``OrderedDict`` containing all of the test groups in the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The ``workqueue`` is an ``OrderedDict`` containing all of the test groups in the
The ``workqueue`` is an ``OrderedDict`` of ``group => list of node ids`` containing all of the test groups in the

single test. This is designed to be extensible so that custom grouping
logic can be applied either by making a child class from this and
overriding the ``get_default_test_group`` method, or by defining the
``pytest_xdist_set_test_group_from_nodeid``.hook. If the hook is used, but it returns
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
``pytest_xdist_set_test_group_from_nodeid``.hook. If the hook is used, but it returns
``pytest_xdist_set_test_group_from_nodeid`` hook. If the hook is used, but it returns

::

assigned_work = {
'<worker node A>': {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'<worker node A>': {
'gw0': {

Using ids as we have them internally will make this easier to understand I think.

::

registered_collections = {
'<worker node A>': [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'<worker node A>': [
'gw0': [

@@ -55,3 +55,55 @@ def pytest_xdist_node_collection_finished(node, ids):
@pytest.mark.firstresult
def pytest_xdist_make_scheduler(config, log):
""" return a node scheduler implementation """


@pytest.mark.trylast
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@pytest.mark.trylast
@pytest.hookspec(trylast=True, firstresult=True)

Besides using the new syntax, I think we should add firstresult here to make it clear in the API that we will only use the first result from the hook.



@pytest.mark.trylast
def pytest_xdist_set_test_group_from_nodeid(nodeid):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def pytest_xdist_set_test_group_from_nodeid(nodeid):
def pytest_xdist_get_test_group_from_nodeid(nodeid):

Perhaps "get" better conveys that we should return the test group? To me "set" conveys that I should set the test group somewhere.



class LoadScopeScheduling(object):
class LoadScopeScheduling(LoadScheduling):
"""Implement load scheduling across nodes, but grouping test by scope.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So loadfile and loadscope are distribute the tests identically now? Or am I missing something?

@hugovk hugovk removed their request for review November 26, 2022 06:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use fixtures to better distribute tests via load scheduling
6 participants