Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: add randomized CI matrix for PR tests #2534

Merged
merged 2 commits into from Jun 10, 2022
Merged

Conversation

vlsi
Copy link
Member

@vlsi vlsi commented Jun 5, 2022

See https://github.com/vlsi/github-actions-random-matrix

Sample finding: #2533

TODO:

@vlsi vlsi force-pushed the random_ci branch 24 times, most recently from 2b7fdb9 to 7762bb3 Compare June 6, 2022 07:53
@vlsi
Copy link
Member Author

vlsi commented Jun 6, 2022

Here's a weird failure:

https://github.com/pgjdbc/pgjdbc/runs/6751799845?check_suite_focus=true#step:8:668

    java.time.format.DateTimeParseException: Text '1582-09-30T12:44:08+12:13:48:00' could not be parsed, unparsed text found at index 28
        at java.base/java.time.format.DateTimeFormatter.parseResolved0(DateTimeFormatter.java:2049)
        at java.base/java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1948)
        at java.base/java.time.OffsetDateTime.parse(OffsetDateTime.java:402)
        at java.base/java.time.OffsetDateTime.parse(OffsetDateTime.java:387)
at org.postgresql.test.jdbc42.SetObject310Test.offsetTimestamps(SetObject310Test.java:329)
at org.postgresql.test.jdbc42.SetObject310Test.testSetOffsetDateTime(SetObject310Test.java:260)

@vlsi vlsi force-pushed the random_ci branch 5 times, most recently from 8c747c9 to c0b5e0d Compare June 6, 2022 16:13
@vlsi
Copy link
Member Author

vlsi commented Jun 9, 2022

If your intention is to have a bunch of random environments tested all the time, schedule the action to run automatically

My intention is to increase test coverage WITHOUT boiling the ocean by scheduling CI jobs with 5 minute intervals.

In other words, I want to keep the number of CI jobs the same, and I want to increase test coverage.

Fixing the seed to a PR number kills the idea of increasing test coverage.

@sehrope
Copy link
Member

sehrope commented Jun 9, 2022

You still have a bunch of unrelated stuff. There's no reason anything outside of the .github/workflows folder should be a part of that same commit.

@sehrope
Copy link
Member

sehrope commented Jun 9, 2022

Please show me that in ANY of the existing PRs.

This very PR has that situation. When you started testing the randomized builds you found a combination involving query mode etc that failed. If that came up in a PR you'd want to re-test that combination again to ensure it's fixed. You wouldn't want the problem to just disappear because a different environment got tested on the next random build.

@vlsi
Copy link
Member Author

vlsi commented Jun 9, 2022

This very PR has that situation. When you started testing the randomized builds you found a combination involving query mode etc that failed. If that came up in a PR you'd want to re-test that combination again to ensure it's fixed. You wouldn't want the problem to just disappear because a different environment got tested on the next random build

Absolutely not.
When I found a case that failed, I either tested it LOCALLY (I started the docker with the same parameters), or I just pushed the fix and the randomizer DID generate a similar case again.

At no point in time, I wanted to re-run the job with the same seed.

@vlsi
Copy link
Member Author

vlsi commented Jun 9, 2022

When you started testing the randomized builds

At the same time, if the seed was bound to a PR number, then I would fail to generate different set of parameters, and it might be that I would miss certain combinations.

So the current PR is a piece of strong evidence that linking the seed to the PR number is worse than randomizing it.

@sehrope
Copy link
Member

sehrope commented Jun 9, 2022

At no point in time, I wanted to re-run the job with the same seed.

As a maintainer of this project, if I see that something failed in CI then I'd want to see that same combination executed again to show that it's fixed. Wouldn't you?

@sehrope
Copy link
Member

sehrope commented Jun 9, 2022

I think this conversation is going in circles. I'm going to wait for someone else to chime in.

@vlsi
Copy link
Member Author

vlsi commented Jun 9, 2022

As a maintainer of this project, if I see that something failed in CI then I'd want to see that same combination executed again to show that it's fixed. Wouldn't you?

As a maintainer of the project, I would rather spend time on

Of course, I agree it might be useful to reproduce exactly the same CI. However, I am sure the need to reproduce CI execution in CI is FAR from what we would actually need.
At the same time, the suggested CI changes are obvious improvements when compared to what we have now. If it turns out we would need to reproduce seeds on a regular basis, then we could discuss what to do about it. Other that that, I do not really understand why do we spend so much time on such a discussion.

@svendiedrichsen
Copy link
Contributor

IMHO randomness in tests is always a pain in the a... You can never be sure if you can trust the test results.

@vlsi
Copy link
Member Author

vlsi commented Jun 9, 2022

I improved "Starting PostgreSQL" logging so it prints an exact command to create the same environment locally

Run echo 'Starting PostgreSQL via docker-compose down; PGV=8.4 TZ=UTC XA=yes SSL=no SCRAM=no CREATE_REPLICAS=no docker-compose up'
Starting PostgreSQL via docker-compose down; PGV=8.4 TZ=UTC XA=yes SSL=no SCRAM=no CREATE_REPLICAS=no docker-compose up

Sample GitHub Actions UI

I am sure it would cover 99% of the cases when CI failure needs to be reproduced.

Having the same versions avoid installing Gradle twice at CI.
@vlsi vlsi force-pushed the random_ci branch 2 times, most recently from beec576 to 9b22089 Compare June 10, 2022 14:24
@vlsi vlsi added this to the 42.4.1 milestone Jun 10, 2022
@vlsi vlsi merged commit f0ba632 into pgjdbc:master Jun 10, 2022
@sehrope
Copy link
Member

sehrope commented Jun 10, 2022

As a maintainer of the project, I would rather spend time on ...

That's not a valid justification for deliberately ignoring feedback. We discussed making the random matrix consistent. I offered and wrote a patch for it the same day. You ignored it and railroaded in your code despite outstanding objections.

And, even though I repeatedly mentioned splitting out the commits, you left in pieces that make unrelated changes: f0ba632#diff-c0dfa6bc7a8685217f70a860145fbdf416d449eaff052fa28352c5cec1a98c06R554-R556

That's just sloppy and it makes changelogs and bisection an unnecessary pain.

At the same time, the suggested CI changes are obvious improvements when compared to what we have now.

That's a straw man. Nobody is claiming it's not an improvement.

The addition to make it consistent per-PR is not hypothetical either. I wrote the patch for it so that it could be cherry-picked into your PR.

If it turns out we would need to reproduce seeds on a regular basis, then we could discuss what to do about it.

We did discuss it. There's even others commenting that it's a good idea.

I improved "Starting PostgreSQL" logging so it prints an exact command to create the same environment locally
I am sure it would cover 99% of the cases when CI failure needs to be reproduced.

Well you hit the 1% immediately as the exact situation I described occurred with your last push to this PR.

The most recent CI run for this PR shows failure for one of the matrix options. I'm sure you tested it locally, but as a reviewer I'd have liked to see that resolved without having to recreate the test environment. That's the whole point of CI, so one does not have to manually run things. And that's why having a deterministic matrix per-PR is so important.

Other that that, I do not really understand why do we spend so much time on such a discussion.

Because as a maintainer I hold you to a higher standard than anyone else submitting code to this repo. If it was a random submitter I'd just rework the code myself or force push to the PR branch. But I know you are more than capable of merging in a patch to your PR yourself. Heck, you're a smart guy and would probably find some improvement in it too.

If you want to take the the patch I put together to make the PR randomization deterministic go for it. Otherwise I'm going to put together a PR for it some time next week.

@vlsi
Copy link
Member Author

vlsi commented Jun 10, 2022

And, even though I repeatedly mentioned splitting out the commits, you left in pieces that make unrelated changes: f0ba632#diff-c0dfa6bc7a8685217f70a860145fbdf416d449eaff052fa28352c5cec1a98c06R554-R556

Junit property is needed since JUnit fails running in the tr_TR locale.
Apparently, it is related to this PR since the PR added both tr_TR tests and the related fixes (junit property, gradle enterprise plugin bump, etc)

@vlsi
Copy link
Member Author

vlsi commented Jun 11, 2022

Well you hit the 1% immediately as the exact situation I described occurred with your last push to this PR.

The most recent CI run for this PR shows failure for one of the matrix options. I'm sure you tested it locally, but as a reviewer I'd have liked to see that resolved without having to recreate the test environment. That's the whole point of CI, so one does not have to manually run things. And that's why having a deterministic matrix per-PR is so important.

@sehrope , "deterministic matrix" won't really help for diagnosing the case.
Apparently, org.postgresql.replication.LogicalReplicationStatusTest is flaky: it fails from time to time.
If only 1 out of 7 jobs execute replication tests, then the probability to get test failure is extremely low, and we would wait a significant time for the other tests to run.

I would say that the proper tool here is amplification of the corner cases rather than retrying with seeds.
See how #2545 changes the matrix to "replication tests only" via commenting a couple of lines only:

--- a/.github/workflows/matrix.js
+++ b/.github/workflows/matrix.js
@@ -127,7 +127,7 @@ matrix.addAxis({
   values: [
     {value: 'yes', weight: 10},
 // Replication requires PG 9.6+
-    {value: 'no', weight: 10},
+//    {value: 'no', weight: 10},
   ]
 });

@@ -247,7 +247,7 @@ include.forEach(v => {

   let includeTestTags = [];
   // See https://junit.org/junit5/docs/current/user-guide/#running-tests-tag-expressions
-  includeTestTags.push('none()'); // untagged tests
+  //includeTestTags.push('none()'); // untagged tests

   if (v.replication === 'yes') {
       includeTestTags.push('org.postgresql.test.Replication');

It spawns 7 jobs where all of them execute just replication tests, so they are faster than typical CI, and we can see that LogicalReplicationStatusTest fails in ~1 out of 7 cases.

So, fixing the seed has no use for analyzing/resolving org.postgresql.replication.LogicalReplicationStatusTest.

PS. Please let me know if you bump into true issues with the randomized matrix, however, I am quite sure the CI should be randomized by default, and the fine-grained control should be implemented via commit messages and/or GitHub labels and/or GitHub comments.

@sehrope
Copy link
Member

sehrope commented Jan 3, 2024

PS. Please let me know if you bump into true issues with the randomized matrix, however, I am quite sure the CI should be randomized by default, and the fine-grained control should be implemented via commit messages and/or GitHub labels and/or GitHub comments.

Here's another example of why randomizing the matrix on every push is a terrible idea:

https://github.com/pgjdbc/pgjdbc/actions/runs/7391190259

vs

https://github.com/pgjdbc/pgjdbc/actions/runs/7391190259

The first run hung for a specific combination (the last matrix one that shows 45m of run time and then cancellation). I pushed what I think is a fix, but I'm not entirely sure it was tested against the broken matrix entry because the entire thing is randomized on every execution. The specific combination that failed was not tested with the fix.

It's exactly the situation I had previously described. If it was deterministic (based on the PR # like I had originally suggested), it'd have tried the same failing combination again and we'd see that it succeeded (or failed!).

I'm not saying don't randomize the CI, I'm saying it shouldn't be consistent within the same PR.

@vlsi
Copy link
Member Author

vlsi commented Jan 3, 2024

@sehrope , you provided identical links.
Then, you could just temporary increased the number of jobs in your PR to trigger more cases.

I have been pushing many commits to many PRs (pgjdbc, jmeter, testng), and I have seen many cases when the first push did not generate a falsification matrix while the next ones did so.

The last time I checked the set of valid combinations was ~100'000, so binding CI to PR would reduce coverage.

If you absolutely care to have consistent matrix for your PRs, you could contribute github label or something else that would alter the matrix strategy.

At the same time, it does not help when you say "terrible idea". It works well, and you do not need to blame others.

@sehrope
Copy link
Member

sehrope commented Jan 3, 2024

I must have copied the wrong link. Here's the two links:

https://github.com/pgjdbc/pgjdbc/actions/runs/7391190259 (first run that fails)

https://github.com/pgjdbc/pgjdbc/actions/runs/7392914186 (fix that does not run the same combo)

Then, you could just temporary increased the number of jobs in your PR to trigger more cases.

And what, randomly run things over and over till you get the combo from before?

For someone who keeps saying that we're "wasting cycles" and "boiling oceans" by running the same matrix twice, are you really suggesting people pick a larger random number and simply hope their combination is hit? And if it fails try again and again?

I have been pushing many commits to many PRs (pgjdbc, jmeter, testng), and I have seen many cases when the first push did not generate a falsification matrix while the next ones did so.

Show me an example of that happening where the combination is not something we'd want to run regularly anyway.

The last time I checked the set of valid combinations was ~100'000, so binding CI to PR would reduce coverage.

I don't understand why you are so insistent on leveraging PR pushes as the source of executing CI. If we really care about having multiple random combinations executing, it should be on master, not on random PR branches being pushed at random times.

Do you disagree that it's more worthwhile to know that the master branch has been tested against all those combinations and not some random unmerged PR branch?

Just set a schedule on master to build it with N random combinations every X minutes and it'll be much better than the nonsense we have now where the testing itself is meaningless. If any of the combinations is interesting, we either add it as an explicitly included one or

If you absolutely care to have consistent matrix for your PRs, you could contribute github label or something else that would alter the matrix strategy.

Before this PR was merged I submitted a patch that made the test matrix consistent with PRs: #2534 (comment)

You ignored it and merged in the version without it. That's despite other maintainers on this project saying it's a good idea: #2534 (comment)

Even random people chiming in on this thread said the same: #2534 (comment)

Can you find one other person that thinks that the PR matrix should be randomized per push?

At the same time, it does not help when you say "terrible idea". It works well, and you do not need to blame others.

I'm saying it's a terrible idea because it is a terrible idea. An idea is not a person. It does not work well and the exact situation I had previously outlined happened with the PasswordUtil PR.

@vlsi
Copy link
Member Author

vlsi commented Jan 3, 2024

I'm saying it's a terrible idea because it is a terrible idea

Do you mean Property-Based Testing and fuzzing are terrible ideas then?
Of course, they are not.

CI parameter randomization is an implementation of a property-based testing approach: the tests must pass no matter what are the configuration parameters like OS, Java version, and so on.

Can you find one other person that thinks that the PR matrix should be randomized per push?

jqwik-team/jqwik#460 (comment)

For someone who keeps saying that we're "wasting cycles" and "boiling oceans" by running the same matrix twice

Exactly. You submitted exactly one PR in-between the 28th January 2022 and 3rd January 2024, so running CI for your change a few more times is not devastating.

At the same time, the entropy case was related to self-hosted runner, and we include it in every job execution, so any re-execution of the matrix should have reproduced the failure.

And if it fails try again and again?

If the CI detects a bug, then you fix it and push again.

I don't understand why you are so insistent on leveraging PR pushes

I want the contributor to figure out the CI failures rather than the maintainer after an incoherent fix is merged.
master pushes do execute randomized CI already.

Just set a schedule on master to build it with N random combinations every X minutes

If we uncover randomized parameters only in master builds, then there's a risk a bad PR would be merged, and then maintainers would have to figure out and fix it.

@sehrope
Copy link
Member

sehrope commented Jan 3, 2024

Do you mean Property-Based Testing and fuzzing are terrible ideas then? Of course, they are not.

I said no such thing. Now you're simply making things up that have nothing to do with this conversation.

I've said repeatedly that randomization itself is not a bad idea to pick a random set of combinations for a given PR. But once it's picked, it should remain consistent. So two different PRs might have different set of matrixes, but all pushes to the same PR have the same matrix.

CI parameter randomization is an implementation of a property-based testing approach: the tests must pass no matter what are the configuration parameters like OS, Java version, and so on.

If the tests fail on combo A and then succeed on combo B, you do not know if they would have passed on combo A.

If the matrix is consistent and the tests pass, then there are no known breaks.

You seem to think that random testing on more environments (but without actually guaranteeing they succeeded) is more important than eliminating any known failures. In your "every push is random" model, the only matrix you can say with certainty is not failing is the most recent one.

jqwik-team/jqwik#460 (comment)

Even the previous comment in that thread the user is effectively asking, "How do I re-run the same failed test?": jqwik-team/jqwik#460 (comment)

An advanced user may be able to replicate it locally but that's not necessarily everybody. And the user may not even be able to replicate it locally (e.g. like this ARM example).

Exactly. You submitted exactly one PR in-between the 28th January 2022 and 3rd January 2024, so running CI for your change a few more times is not devastating.

So it's okay to boil oceans as long as it's not too often?

Don't forget the time wasted repeatedly manually triggering CI.

At the same time, the entropy case was related to self-hosted runner, and we include it in every job execution, so any re-execution of the matrix should have reproduced the failure.

No it wouldn't because the JDK version was different on the two runs.

We don't know if the fix did anything at all or if the problem was something else on the different JDK. Testing it locally is not an option either as it just happened to be the ARM build.

Apparently, your suggestion to address this is to manually trigger the matrix over and over until that exact combination comes up again.

If the CI detects a bug, then you fix it and push again.

If the matrix is not the same, you cannot be sure your fix addressed the original failure. Maybe it didn't run at all. Maybe it only worked because the JDK changed. Maybe it only worked because some other flag was set.

If the matrix is consistent for a PR then you can definitively say, "The test that previously failed no longer fails".

I want the contributor to figure out the CI failures rather than the maintainer after an incoherent fix is merged.
master pushes do execute randomized CI already.

Then you should want the contributor to be able to definitively say, "My PR does not have failing tests". You can't do that if every single push is against a different matrix. There's no guarantee it wouldn't have failed against the prior matrix.

If we uncover randomized parameters only in master builds, then there's a risk a bad PR would be merged, and then maintainers would have to figure out and fix it.

Which can happen anyway. It's not like PRs are testing every possible permutation. The risk of an unforeseen break or combination will always exist.

Just because K permutations were run in a PR does not mean that M permutations in master are going to pass. Which is why we have a separate set of random tests on master. I forget if we have it set to run on a schedule but we should just have it go every morning on HEAD. That's what that omni workflow was supposed to do.

@davecramer
Copy link
Member

There are merits to both sides of this discussion.
One one hand if randomization from one push to the next breaks due to randomization then we have found a bug, however once we have a failure it would be nice to be able to replicate exactly the same conditions that exposed the bug. Otherwise if we randomize the next PR it may pass simply due to a different set of conditions as opposed to actually fixing the root cause.
Is there a way to force the same conditions from one PR to another after a failing test ?

@sehrope
Copy link
Member

sehrope commented Jan 3, 2024

Is there a way to force the same conditions from one PR to another after a failing test ?

Maybe if we persisted the random seed or matrix combination after failure. But that seems like a lot of complexity for questionable gain.

Plus there's the situation where you push and test on combo A and everything passed, and then you pushed changes and it ran on combo B. The "success" from combo A isn't really relevant anymore as it's now yet another untested combination (for the latest push). So even in the CI success case you want it to be consistent if new commits are pushed.

So randomization is fine for each PR as long as it's fixed for the life of the PR. The patch I had put together did that by using a deterministic random generator seeded by the PR number. If it's not a PR, it picks a truly random value each time (e.g. running on master).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants