Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify if parallel test execution affects pytest_sessionstart #12341

Closed
foobar-coder opened this issue Jul 13, 2021 · 10 comments
Closed

Clarify if parallel test execution affects pytest_sessionstart #12341

foobar-coder opened this issue Jul 13, 2021 · 10 comments

Comments

@foobar-coder
Copy link

foobar-coder commented Jul 13, 2021

I am wondering if pytest_sessionstart is executed only once when pants runs Python tests in parallel

There are libraries out there to control single execution of setup code for xdist (e.g., see pytest-xdist's issue #271).

But it's unclear to me if pants needs something like that.

I could look into the source code but I am looking for some sort of "official" answer for this.

(I know that the --debug option forces non-parallel execution but that's too slow)

@jsirois
Copy link
Member

jsirois commented Jul 13, 2021

Pants executes each test file seperately via pytest in a seperate process with no other coordination. Almost literally: pytest my/file.py in one subprocess and pytest my/other_file.py in another when on your local machine (the default). Or via seperate subprocesses on seperate machines when remoting is enabled.

So the pytest integration is "dumb" in this regard as nearly all Pants integrations with tools are by default. In general, Pants just creates a hermetic filesystem tree and executes the smallest performant cacheable unit of work it can in a subprocess.

@jsirois
Copy link
Member

jsirois commented Jul 13, 2021

So, if Pants did gain support in the core rules or a library, there would be two seperate cases:

  1. Interprocess locks to ensure single-setup of things like databases on a single host running Pants locally.
  2. Distributed locks for the remoting case.

This definitely may be useful / improve test throughput in certain cases, but its also non-trivial afaict,

@foobar-coder
Copy link
Author

Thanks for the explanation @jsirois

At present I would only need something for scenario 1:

  1. Interprocess locks to ensure single-setup of things like databases on a single host running Pants locally.

Is there any way to tell pants to invoke some kind of hook before running any of the tests in a given python_tests target?

For example, before running python_tests you have to execute runtime_package_dependencies. Maybe the runtime dependency for the tests could be a fake python_distribution whose setup_py_commands include running the initialization code needed for the tests (e.g., create a file-based DB or some other needed resource). Maybe --test-force could also force rebuilding the runtime_package_dependencies of a given target.

@jsirois
Copy link
Member

jsirois commented Jul 13, 2021

Is there any way to tell pants to invoke some kind of hook before running any of the tests in a given python_tests target?

There is not that I'm aware of. I do feel like someone was working on this feature though. You might ask on pantsbuild Slack though.

For example, before running python_tests you have to execute runtime_package_dependencies.

I think that would be one path to hack - yes.

I will note I personally do simply use an (two actually) interprocess lock over in Pex which uses tox / pytest-xdist: https://github.com/pantsbuild/pex/blob/5a7c0939392d6b462ca60d77393e5175284a1e67/pex/testing.py#L434-L479. In that example the atomic_directory context manager uses a file lock under the covers to ensure a single process is setting up a hermetic pyenv in a well known directory all test subroccess block on.

@Eric-Arellano
Copy link
Contributor

Is there any way to tell pants to invoke some kind of hook before running any of the tests in a given python_tests target?

There is! There is a plugin hook with the PytestPluginSetupRequest, which allows you to do things like set up a database used by all tests. It's not yet documented with a guide, but see

@dataclass(frozen=True)
class PytestPluginSetup:
"""The result of custom set up logic before Pytest runs.
Please reach out it if you would like certain functionality, such as allowing your plugin to set
environment variables.
"""
digest: Digest = EMPTY_DIGEST
@union
@dataclass(frozen=True) # type: ignore[misc]
class PytestPluginSetupRequest(ABC):
"""A request to set up the test environment before Pytest runs, e.g. to set up databases.
To use, subclass PytestPluginSetupRequest, register the rule
`UnionRule(PytestPluginSetupRequest, MyCustomPytestPluginSetupRequest)`, and add a rule that
takes your subclass as a parameter and returns `PytestPluginSetup`.
"""
target: Target
@classmethod
@abstractmethod
def is_applicable(cls, target: Target) -> bool:
"""Whether the setup implementation should be used for this target or not."""

And a simple example:

class UsedPlugin(PytestPluginSetupRequest):
@classmethod
def is_applicable(cls, target: Target) -> bool:
return True
class UnusedPlugin(PytestPluginSetupRequest):
@classmethod
def is_applicable(cls, target: Target) -> bool:
return False
@rule
async def used_plugin(_: UsedPlugin) -> PytestPluginSetup:
digest = await Get(Digest, CreateDigest([FileContent("used.txt", b"")]))
return PytestPluginSetup(digest=digest)
@rule
async def unused_plugin(_: UnusedPlugin) -> PytestPluginSetup:
digest = await Get(Digest, CreateDigest([FileContent("unused.txt", b"")]))
return PytestPluginSetup(digest=digest)
def test_setup_plugins_and_runtime_package_dependency(rule_runner: RuleRunner) -> None:
# We test both the generic `PytestPluginSetup` mechanism and our `runtime_package_dependencies`
# feature in the same test to confirm multiple plugins can be used on the same target.
rule_runner = RuleRunner(
rules=[
*rule_runner.rules,
used_plugin,
unused_plugin,
UnionRule(PytestPluginSetupRequest, UsedPlugin),
UnionRule(PytestPluginSetupRequest, UnusedPlugin),
],
target_types=rule_runner.target_types,
)
rule_runner.write_files(
{
f"{PACKAGE}/say_hello.py": "print('Hello, test!')",
f"{PACKAGE}/test_binary_call.py": dedent(
f"""\
import os.path
import subprocess
def test_embedded_binary():
assert os.path.exists("bin.pex")
assert b"Hello, test!" in subprocess.check_output(args=['./bin.pex'])
# Ensure that we didn't accidentally pull in the binary's sources. This is a
# special type of dependency that should not be included with the rest of the
# normal dependencies.
assert not os.path.exists("{PACKAGE}/say_hello.py")
def test_additional_plugins():
assert os.path.exists("used.txt")
assert not os.path.exists("unused.txt")
"""
),
f"{PACKAGE}/BUILD": dedent(
"""\
python_library(name='bin_lib', sources=['say_hello.py'])
pex_binary(name='bin', entry_point='say_hello.py', output_path="bin.pex")
python_tests(runtime_package_dependencies=[':bin'])
"""
),
}
)
tgt = rule_runner.get_target(Address(PACKAGE, relative_file_path="test_binary_call.py"))
result = run_pytest(rule_runner, tgt)
assert result.exit_code == 0

We're happy to help at #plugins channel in Slack (or here) with developing a plugin.

@jsirois
Copy link
Member

jsirois commented Jul 14, 2021

That was it. Its a bit unfortunate though since ~everyone will need to write a plugin. It would be great if we had a more declarative way to do this with BUILD file target metadata, etc.

@Eric-Arellano
Copy link
Contributor

Agreed. Fwit, this plugin is how we implement runtime_package_dependencies. We could continue to add prebuilt setup functionality like that as common use cases emerge.

@foobar-coder
Copy link
Author

Thanks @Eric-Arellano

Just to be sure I understand. If I have N python_tests targets in my repo and I only want one of them to execute the initialization code, the plug-in mechanism you suggested will allow me to do that, right? If so, is there anything special I need to do?

In my case, the majority of tests are unit tests and don't need any kind of extra initialization. It's only a few integration tests that do, and they are not always executed (unlike the unit tests).

P.S. I agree with John about using a declarative way being better, but this is a good start.

@Eric-Arellano
Copy link
Contributor

If I have N python_tests targets in my repo and I only want one of them to execute the initialization code, the plug-in mechanism you suggested will allow me to do that, right? If so, is there anything special I need to do?

Yes. You can set the is_applicable() classmethod appropriately, e.g. to only run on certain values for target.address or check if the target has a certain field.

P.S. I agree with John about using a declarative way being better, but this is a good start.

Agreed. The trick is finding something generic enough for multiple organizations to benefit from, and that all the logic can be expressed via BUILD files. That worked with runtime_package_dependencies, and I imagine there may be other common cases we can expose.

@Eric-Arellano
Copy link
Contributor

Hey @foobar-coder , how is this going? Feel free to head to Slack in the #plugins channel for help: https://www.pantsbuild.org/docs/getting-help

I'm going to close, but still feel free to respond and ask questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants