Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

provide unique seed to each test #600

Open
brycedrennan opened this issue Jan 12, 2024 · 7 comments · May be fixed by #617
Open

provide unique seed to each test #600

brycedrennan opened this issue Jan 12, 2024 · 7 comments · May be fixed by #617

Comments

@brycedrennan
Copy link

brycedrennan commented Jan 12, 2024

Description

Per #531 its unexpected that randomly generated values in different tests would be identical. It's common in test suites to use randomly generated identifiers to help maintain test isolation and provide easier-to-understand logs.

Examples:

  • one may be using a third party system that cannot be reset but where isolation can be provided by using new unique identifiers for each test.
  • one may be using unique s3 buckets per test (and want them to persist for review afterwords)
  • one may be reading test suite logs and want to find logs with a specific identifier related to a specific test failure

One could use --randomly-dont-reset-seed but then they would lose the reproducibility and repeatability.

I propose that the node_id of the test (and the test run seed) be used to deterministically generate a unique seed for that test (instead of all tests being reset to the same fixed value).

@adamchainz
Copy link
Member

Using a random number per test does not guarantee isolation in remote systems. The birthday problem makes the chance of collision between many tests high, at least for 32-bit identifiers.

Why not use shared counter between tests to absolutely guarantee unique numbers?

I am not exactly against the proposal though. It might make sense for the reason that unique random seeds will make a test suite use more random data, increasing the amount of fuzz testing.

@brycedrennan
Copy link
Author

Actual scenario I encountered:
Each test run spins up a new postgres database and populates it with the correct schema. There are about ten tests that need "users" in the database to work. The test fixture generates a random email address from random-first-name.random-last-name.random-number(1-100000).

Even accounting for the birthday problem, the chances of collision are very small.

If the email addresses are unique then understanding the logs is a lot easier. You just look for the email address that relates to the test. You could even keep the database around and look at records related to that account. But if the "random" email addresses are the same in every test, that's unexpected and makes debugging a lot harder.

@bcm-at-zama
Copy link

bcm-at-zama commented Mar 18, 2024

I think I have the same request than @brycedrennan, but if not, I could make another issue, tell me: I would like to be able to call with --randomly-seed=1976373286, but to have that the seed used for all test test_xxx_i of all files path_j / file_j depend on the

  • randomly seed
  • test_xxx_i string
  • path_j / file_j

I think it allows to have tests with similar code, but to make sure that the real seeds that will be used in them are different, for more test coverage.

It's basically what I had done in our https://github.com/zama-ai/concrete-ml/blob/main/conftest.py#L171 autoseeding_of_everything function, before I realize that somehow, randomly overwrites my seeding just after, in its

def pytest_runtest_call(item: Item) -> None:
    if item.config.getoption("randomly_reset_seed"):
        _reseed(item.config)

I would be happy to participate or do a PR for that, I'll just need a bit of guidance to make the right implementation choice.

@bcm-at-zama
Copy link

OK so actually, what I did with autoseeding_of_everything with --randomly-dont-reset-seed looks to make the job! We can:

  • seed everything
  • all seeds used in practice depends on the filename / the function name / the parameters, such that, even if you reuse the same kind of test body for different usages, the PRNG used in practice will be different in them

If requested, I'll be happy to try to add it to the project!

@adamchainz
Copy link
Member

@bcm-at-zama That sounds like about the same request here. Rather than combine the filename and test name like you’ve done in your fixture, we should instead use the pytest “test ID”, which includes the path already, plus extra details like test parameters IIRC.

I agree with @brycedrennan that we can actually ignore the birthday problem. I still don't like the idea of trying to rely on differing random seeds to avoid collisions. But it does sound like a good idea to increase test coverage.

@bcm-at-zama , would you like to try creating a PR? Please include tests, a changelog note, and any relevant updates to the README.

@brycedrennan
Copy link
Author

@adamchainz yeah its a good point. a global counter in a fixture (possibly combined with some more random data) would be a more thorough solution. On the other hand, sometimes tests are distributed among machines and a global counter would actually be a bit tricky to implement so it's nice to have options.

@bcm-at-zama
Copy link

Great @adamchainz : I don't know about test ID but I'll have a look. I'll try to make this PR in April!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants