Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test fails when called alone, succeeds with whole test module #57

Open
HealthyPear opened this issue Mar 30, 2023 · 8 comments
Open

Test fails when called alone, succeeds with whole test module #57

HealthyPear opened this issue Mar 30, 2023 · 8 comments
Labels

Comments

@HealthyPear
Copy link

HealthyPear commented Mar 30, 2023

I have an argparse-based script named foo that accepts multiple flags (say --a and --b).

I wrote a test module for this script, testing each flag separately,

def test_foo_a(script_runner):

    result = script_runner.run(
        "foo",
        "--a",
    )

    assert result.success
    assert result.stderr == ""

def test_foo_b(script_runner):

    result = script_runner.run(
        "foo",
        "--b",
    )

    assert result.success
    assert result.stderr == ""

What happens is that,

pytest -k "test_foo_b"

fails (and I know why, I can see the traceback), but,

pytest path/to/test_module.py

succeeds.

I can try to replicate this with a script simpler than what I have, but perhaps in the meantime you can find the problem if it's a bug.

@kvas-it
Copy link
Owner

kvas-it commented May 8, 2023

Hi @HealthyPear!
Sorry for the late answer, life has been busy lately.

When pytest-console-script runs the scripts in in-process mode, it only imports your code once, so different runs might interfere with each other. This could lead to tests behaving differently in in-process vs. subprocess mode (in subprocess mode it runs a separate Python interpreter, which is clean and more realistic, but unfortunately slow).

Try running both tests in subprocess mode (for example by passing --script-launch-mode=subprocess to pytest) and see if you get the expected behavior. If you do, that would mean that most likely the different runs of the script in in-process mode are influencing each other. This could be because of some initialisation that happens when you run both tests but not with only one test, or something like that. These kinds of situations can be tricky to figure out but if nothing else helps, you can use the subprocess mode to get a more realistic behavior.

Cheers,
Vasily

@HexDecimal HexDecimal added the bug label May 16, 2023
@HexDecimal
Copy link
Collaborator

HexDecimal commented May 24, 2023

In-process mode reads, compiles, and executes the script. The script itself won't share data but anything it imports will be shared across all runs of all scripts for the entire session. This is bad since scripts are supposed to run from a clean state.

One option is to mess with sys.modules, cleaning it before running a script will cause a reload of imported modules. However, editing sys.modules is not thread-safe and might break test parallelism. I'm not sure if this project is trying to support that kind of thing.

Sometime in the future Python should support PEP 554 subinterpreters which can simulate a clean start for scripts.

@HexDecimal
Copy link
Collaborator

The downside to actually setting up a fresh state this that any mocks will be broken. So I'm not sure it's even a good idea to try and fix this.

@kvas-it
Copy link
Owner

kvas-it commented May 24, 2023

One option is to mess with sys.modules, cleaning it before running a script will cause a reload of imported modules. However, editing sys.modules is not thread-safe and might break test parallelism. I'm not sure if this project is trying to support that kind of thing.

When I started the project, my thinking was that inprocess mode is the speedup for those cases where it works and where it doesn't we just revert to inprocess. Implementing module unloading to make inprocess mode like subprocess mode would be complicated and still have corner cases that fail so I considered it not worth the effort back then (now I unfortunately have less time for the project, so this assessment hasn't changed). I'm not opposed to this in principle, if someone would like to implement it, but I do have a concern that it might make the project harder to maintain so we'd need to discuss the details.

Sometime in the future Python should support PEP 554 subinterpreters which can simulate a clean start for scripts.

Yes, this would work, although I'm not sure how much of a speedup we'd get with subinterpreters compared to subprocess. If/when subinterpreters are implemented, the most logical way seems to make it a new execution mode that is somewhere between inprocess and subprocess.

The downside to actually setting up a fresh state this that any mocks will be broken. So I'm not sure it's even a good idea to try and fix this.

I agree.

@HexDecimal
Copy link
Collaborator

Yeah, I was just greatly overthinking this for a moment. Inprocess is good to have, but having it as the default might cause more issues like the one seen here.

As interesting as it'd be to implement subinterpreters that'd probably just be a novelty rather than a useful feature. Subprocess already works well for isolating scripts.

So the issue's been identified and there's not much that can be changed to prevent this other than maybe further clarifying the two modes in the documentation or changing the default.

Maybe a later release could change the default to subprocess and have inprocess be something opted into for performance or mocking, and then right now just warn if nothing is explicitly set as the default in a conftest.py or other config.

@kvas-it
Copy link
Owner

kvas-it commented May 24, 2023

So the issue's been identified and there's not much that can be changed to prevent this other than maybe further clarifying the two modes in the documentation or changing the default.

Maybe a later release could change the default to subprocess and have inprocess be something opted into for performance or mocking, and then right now just warn if nothing is explicitly set as the default in a conftest.py or other config.

Yes, makes sense. Inprocess became the default not because of deep considerations but just because that's how I was using the plugin most of the time. The primary motivation for pytest-console-scripts was speed and the idea was something like "run tests inprocess during development for fast iteration and in both modes in CI to make sure inprocess side-effects are not affecting the outcome". At this point I'm leaning towards improving the documentation to make the modes and the differences between them more prominent. I don't see a very strong case for changing the default mode so I'd rather leave it as is to not force all the users to adapt their test suites.

@arcsector
Copy link

In case anyone else comes across this issue when doing per-test testing, the docs also clarify that using the library's decorator to mark the test as subprocess mode will work as well:

import pytest

@pytest.mark.script_launch_mode('subprocess')
def test_foo_a(script_runner):

Though, what's still unclear to me, is how to convert your library that is using entry_points={"console_scripts":[]} in setup.py from working with only subprocess to working with inprocess... For me, testing on Windows, I only get the following error during inprocess execution:

Traceback (most recent call last):
  File "C:\Users\myuser\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\pytest_console_scripts.py", line 195, in exec_script
    compiled = compile(script.read(), str(script), 'exec', flags=0)
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3568.0_x64__qbz5n2kfra8p0\lib\codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x90 in position 2: invalid start byte

Perhaps this is a Windows-specific issue with decoding stdout and stderr?

@HexDecimal
Copy link
Collaborator

Windows often has locale issues and will refuse to use a UTF-8 encoding until it's forced to with locale.setlocale. Python has a forced UTF8 mode but I'm not sure it always works. All other platforms tend to default to a utf-8 encoding.

Regardless, encoding errors are unrelated to the current issue. This should've been a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants