provide unique seed to each test #600

brycedrennan · 2024-01-12T00:05:43Z

Description

Per #531 its unexpected that randomly generated values in different tests would be identical. It's common in test suites to use randomly generated identifiers to help maintain test isolation and provide easier-to-understand logs.

Examples:

one may be using a third party system that cannot be reset but where isolation can be provided by using new unique identifiers for each test.
one may be using unique s3 buckets per test (and want them to persist for review afterwords)
one may be reading test suite logs and want to find logs with a specific identifier related to a specific test failure

One could use --randomly-dont-reset-seed but then they would lose the reproducibility and repeatability.

I propose that the node_id of the test (and the test run seed) be used to deterministically generate a unique seed for that test (instead of all tests being reset to the same fixed value).

The text was updated successfully, but these errors were encountered:

adamchainz · 2024-01-18T13:07:47Z

Using a random number per test does not guarantee isolation in remote systems. The birthday problem makes the chance of collision between many tests high, at least for 32-bit identifiers.

Why not use shared counter between tests to absolutely guarantee unique numbers?

I am not exactly against the proposal though. It might make sense for the reason that unique random seeds will make a test suite use more random data, increasing the amount of fuzz testing.

brycedrennan · 2024-02-23T01:20:41Z

Actual scenario I encountered:
Each test run spins up a new postgres database and populates it with the correct schema. There are about ten tests that need "users" in the database to work. The test fixture generates a random email address from random-first-name.random-last-name.random-number(1-100000).

Even accounting for the birthday problem, the chances of collision are very small.

If the email addresses are unique then understanding the logs is a lot easier. You just look for the email address that relates to the test. You could even keep the database around and look at records related to that account. But if the "random" email addresses are the same in every test, that's unexpected and makes debugging a lot harder.

bcm-at-zama · 2024-03-18T14:58:35Z

I think I have the same request than @brycedrennan, but if not, I could make another issue, tell me: I would like to be able to call with --randomly-seed=1976373286, but to have that the seed used for all test test_xxx_i of all files path_j / file_j depend on the

randomly seed
test_xxx_i string
path_j / file_j

I think it allows to have tests with similar code, but to make sure that the real seeds that will be used in them are different, for more test coverage.

It's basically what I had done in our https://github.com/zama-ai/concrete-ml/blob/main/conftest.py#L171 autoseeding_of_everything function, before I realize that somehow, randomly overwrites my seeding just after, in its

def pytest_runtest_call(item: Item) -> None:
    if item.config.getoption("randomly_reset_seed"):
        _reseed(item.config)

I would be happy to participate or do a PR for that, I'll just need a bit of guidance to make the right implementation choice.

bcm-at-zama · 2024-03-20T17:58:52Z

OK so actually, what I did with autoseeding_of_everything with --randomly-dont-reset-seed looks to make the job! We can:

seed everything
all seeds used in practice depends on the filename / the function name / the parameters, such that, even if you reuse the same kind of test body for different usages, the PRNG used in practice will be different in them

If requested, I'll be happy to try to add it to the project!

adamchainz · 2024-03-20T20:04:13Z

@bcm-at-zama That sounds like about the same request here. Rather than combine the filename and test name like you’ve done in your fixture, we should instead use the pytest “test ID”, which includes the path already, plus extra details like test parameters IIRC.

I agree with @brycedrennan that we can actually ignore the birthday problem. I still don't like the idea of trying to rely on differing random seeds to avoid collisions. But it does sound like a good idea to increase test coverage.

@bcm-at-zama , would you like to try creating a PR? Please include tests, a changelog note, and any relevant updates to the README.

brycedrennan · 2024-03-21T02:29:28Z

@adamchainz yeah its a good point. a global counter in a fixture (possibly combined with some more random data) would be a more thorough solution. On the other hand, sometimes tests are distributed among machines and a global counter would actually be a bit tricky to implement so it's nice to have options.

bcm-at-zama · 2024-03-21T08:02:08Z

Great @adamchainz : I don't know about test ID but I'll have a look. I'll try to make this PR in April!

brycedrennan linked a pull request Apr 10, 2024 that will close this issue

feature: add the option --randomly-seed-per-test to use a different seed for each test #617

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

provide unique seed to each test #600

provide unique seed to each test #600

brycedrennan commented Jan 12, 2024 •

edited

adamchainz commented Jan 18, 2024

brycedrennan commented Feb 23, 2024

bcm-at-zama commented Mar 18, 2024 •

edited

bcm-at-zama commented Mar 20, 2024

adamchainz commented Mar 20, 2024

brycedrennan commented Mar 21, 2024

bcm-at-zama commented Mar 21, 2024

provide unique seed to each test #600

provide unique seed to each test #600

Comments

brycedrennan commented Jan 12, 2024 • edited

Description

adamchainz commented Jan 18, 2024

brycedrennan commented Feb 23, 2024

bcm-at-zama commented Mar 18, 2024 • edited

bcm-at-zama commented Mar 20, 2024

adamchainz commented Mar 20, 2024

brycedrennan commented Mar 21, 2024

bcm-at-zama commented Mar 21, 2024

brycedrennan commented Jan 12, 2024 •

edited

bcm-at-zama commented Mar 18, 2024 •

edited