New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: add randomized CI matrix for PR tests #2534
Conversation
2b7fdb9
to
7762bb3
Compare
Here's a weird failure: https://github.com/pgjdbc/pgjdbc/runs/6751799845?check_suite_focus=true#step:8:668
|
8c747c9
to
c0b5e0d
Compare
My intention is to increase test coverage WITHOUT boiling the ocean by scheduling CI jobs with 5 minute intervals. In other words, I want to keep the number of CI jobs the same, and I want to increase test coverage. Fixing the seed to a PR number kills the idea of increasing test coverage. |
You still have a bunch of unrelated stuff. There's no reason anything outside of the .github/workflows folder should be a part of that same commit. |
This very PR has that situation. When you started testing the randomized builds you found a combination involving query mode etc that failed. If that came up in a PR you'd want to re-test that combination again to ensure it's fixed. You wouldn't want the problem to just disappear because a different environment got tested on the next random build. |
Absolutely not. At no point in time, I wanted to re-run the job with the same seed. |
At the same time, if the seed was bound to a PR number, then I would fail to generate different set of parameters, and it might be that I would miss certain combinations. So the current PR is a piece of strong evidence that linking the seed to the PR number is worse than randomizing it. |
As a maintainer of this project, if I see that something failed in CI then I'd want to see that same combination executed again to show that it's fixed. Wouldn't you? |
I think this conversation is going in circles. I'm going to wait for someone else to chime in. |
As a maintainer of the project, I would rather spend time on
Of course, I agree it might be useful to reproduce exactly the same CI. However, I am sure the need to reproduce CI execution in CI is FAR from what we would actually need. |
IMHO randomness in tests is always a pain in the a... You can never be sure if you can trust the test results. |
I improved "Starting PostgreSQL" logging so it prints an exact command to create the same environment locally
I am sure it would cover 99% of the cases when CI failure needs to be reproduced. |
Having the same versions avoid installing Gradle twice at CI.
beec576
to
9b22089
Compare
That's not a valid justification for deliberately ignoring feedback. We discussed making the random matrix consistent. I offered and wrote a patch for it the same day. You ignored it and railroaded in your code despite outstanding objections. And, even though I repeatedly mentioned splitting out the commits, you left in pieces that make unrelated changes: f0ba632#diff-c0dfa6bc7a8685217f70a860145fbdf416d449eaff052fa28352c5cec1a98c06R554-R556 That's just sloppy and it makes changelogs and bisection an unnecessary pain.
That's a straw man. Nobody is claiming it's not an improvement. The addition to make it consistent per-PR is not hypothetical either. I wrote the patch for it so that it could be cherry-picked into your PR.
We did discuss it. There's even others commenting that it's a good idea.
Well you hit the 1% immediately as the exact situation I described occurred with your last push to this PR. The most recent CI run for this PR shows failure for one of the matrix options. I'm sure you tested it locally, but as a reviewer I'd have liked to see that resolved without having to recreate the test environment. That's the whole point of CI, so one does not have to manually run things. And that's why having a deterministic matrix per-PR is so important.
Because as a maintainer I hold you to a higher standard than anyone else submitting code to this repo. If it was a random submitter I'd just rework the code myself or force push to the PR branch. But I know you are more than capable of merging in a patch to your PR yourself. Heck, you're a smart guy and would probably find some improvement in it too. If you want to take the the patch I put together to make the PR randomization deterministic go for it. Otherwise I'm going to put together a PR for it some time next week. |
Junit property is needed since JUnit fails running in the tr_TR locale. |
@sehrope , "deterministic matrix" won't really help for diagnosing the case. I would say that the proper tool here is amplification of the corner cases rather than retrying with seeds. --- a/.github/workflows/matrix.js
+++ b/.github/workflows/matrix.js
@@ -127,7 +127,7 @@ matrix.addAxis({
values: [
{value: 'yes', weight: 10},
// Replication requires PG 9.6+
- {value: 'no', weight: 10},
+// {value: 'no', weight: 10},
]
});
@@ -247,7 +247,7 @@ include.forEach(v => {
let includeTestTags = [];
// See https://junit.org/junit5/docs/current/user-guide/#running-tests-tag-expressions
- includeTestTags.push('none()'); // untagged tests
+ //includeTestTags.push('none()'); // untagged tests
if (v.replication === 'yes') {
includeTestTags.push('org.postgresql.test.Replication'); It spawns 7 jobs where all of them execute just replication tests, so they are faster than typical CI, and we can see that So, fixing the seed has no use for analyzing/resolving PS. Please let me know if you bump into true issues with the randomized matrix, however, I am quite sure the CI should be randomized by default, and the fine-grained control should be implemented via commit messages and/or GitHub labels and/or GitHub comments. |
Here's another example of why randomizing the matrix on every push is a terrible idea: https://github.com/pgjdbc/pgjdbc/actions/runs/7391190259 vs https://github.com/pgjdbc/pgjdbc/actions/runs/7391190259 The first run hung for a specific combination (the last matrix one that shows 45m of run time and then cancellation). I pushed what I think is a fix, but I'm not entirely sure it was tested against the broken matrix entry because the entire thing is randomized on every execution. The specific combination that failed was not tested with the fix. It's exactly the situation I had previously described. If it was deterministic (based on the PR # like I had originally suggested), it'd have tried the same failing combination again and we'd see that it succeeded (or failed!). I'm not saying don't randomize the CI, I'm saying it shouldn't be consistent within the same PR. |
@sehrope , you provided identical links. I have been pushing many commits to many PRs (pgjdbc, jmeter, testng), and I have seen many cases when the first push did not generate a falsification matrix while the next ones did so. The last time I checked the set of valid combinations was ~100'000, so binding CI to PR would reduce coverage. If you absolutely care to have consistent matrix for your PRs, you could contribute github label or something else that would alter the matrix strategy. At the same time, it does not help when you say "terrible idea". It works well, and you do not need to blame others. |
I must have copied the wrong link. Here's the two links: https://github.com/pgjdbc/pgjdbc/actions/runs/7391190259 (first run that fails) https://github.com/pgjdbc/pgjdbc/actions/runs/7392914186 (fix that does not run the same combo)
And what, randomly run things over and over till you get the combo from before? For someone who keeps saying that we're "wasting cycles" and "boiling oceans" by running the same matrix twice, are you really suggesting people pick a larger random number and simply hope their combination is hit? And if it fails try again and again?
Show me an example of that happening where the combination is not something we'd want to run regularly anyway.
I don't understand why you are so insistent on leveraging PR pushes as the source of executing CI. If we really care about having multiple random combinations executing, it should be on Do you disagree that it's more worthwhile to know that the Just set a schedule on
Before this PR was merged I submitted a patch that made the test matrix consistent with PRs: #2534 (comment) You ignored it and merged in the version without it. That's despite other maintainers on this project saying it's a good idea: #2534 (comment) Even random people chiming in on this thread said the same: #2534 (comment) Can you find one other person that thinks that the PR matrix should be randomized per push?
I'm saying it's a terrible idea because it is a terrible idea. An idea is not a person. It does not work well and the exact situation I had previously outlined happened with the PasswordUtil PR. |
Do you mean Property-Based Testing and fuzzing are terrible ideas then? CI parameter randomization is an implementation of a property-based testing approach: the tests must pass no matter what are the configuration parameters like OS, Java version, and so on.
jqwik-team/jqwik#460 (comment)
Exactly. You submitted exactly one PR in-between the 28th January 2022 and 3rd January 2024, so running CI for your change a few more times is not devastating. At the same time, the entropy case was related to
If the CI detects a bug, then you fix it and push again.
I want the contributor to figure out the CI failures rather than the maintainer after an incoherent fix is merged.
If we uncover randomized parameters only in |
I said no such thing. Now you're simply making things up that have nothing to do with this conversation. I've said repeatedly that randomization itself is not a bad idea to pick a random set of combinations for a given PR. But once it's picked, it should remain consistent. So two different PRs might have different set of matrixes, but all pushes to the same PR have the same matrix.
If the tests fail on combo A and then succeed on combo B, you do not know if they would have passed on combo A. If the matrix is consistent and the tests pass, then there are no known breaks. You seem to think that random testing on more environments (but without actually guaranteeing they succeeded) is more important than eliminating any known failures. In your "every push is random" model, the only matrix you can say with certainty is not failing is the most recent one. Even the previous comment in that thread the user is effectively asking, "How do I re-run the same failed test?": jqwik-team/jqwik#460 (comment) An advanced user may be able to replicate it locally but that's not necessarily everybody. And the user may not even be able to replicate it locally (e.g. like this ARM example).
So it's okay to boil oceans as long as it's not too often? Don't forget the time wasted repeatedly manually triggering CI.
No it wouldn't because the JDK version was different on the two runs. We don't know if the fix did anything at all or if the problem was something else on the different JDK. Testing it locally is not an option either as it just happened to be the ARM build. Apparently, your suggestion to address this is to manually trigger the matrix over and over until that exact combination comes up again.
If the matrix is not the same, you cannot be sure your fix addressed the original failure. Maybe it didn't run at all. Maybe it only worked because the JDK changed. Maybe it only worked because some other flag was set. If the matrix is consistent for a PR then you can definitively say, "The test that previously failed no longer fails".
Then you should want the contributor to be able to definitively say, "My PR does not have failing tests". You can't do that if every single push is against a different matrix. There's no guarantee it wouldn't have failed against the prior matrix.
Which can happen anyway. It's not like PRs are testing every possible permutation. The risk of an unforeseen break or combination will always exist. Just because K permutations were run in a PR does not mean that M permutations in master are going to pass. Which is why we have a separate set of random tests on master. I forget if we have it set to run on a schedule but we should just have it go every morning on HEAD. That's what that omni workflow was supposed to do. |
There are merits to both sides of this discussion. |
Maybe if we persisted the random seed or matrix combination after failure. But that seems like a lot of complexity for questionable gain. Plus there's the situation where you push and test on combo A and everything passed, and then you pushed changes and it ran on combo B. The "success" from combo A isn't really relevant anymore as it's now yet another untested combination (for the latest push). So even in the CI success case you want it to be consistent if new commits are pushed. So randomization is fine for each PR as long as it's fixed for the life of the PR. The patch I had put together did that by using a deterministic random generator seeded by the PR number. If it's not a PR, it picks a truly random value each time (e.g. running on master). |
See https://github.com/vlsi/github-actions-random-matrix
Sample finding: #2533
TODO:
:pgjdbc-osgi-test:test
fails when database is configured in non-ssl mode (see https://github.com/pgjdbc/pgjdbc/runs/6751972575?check_suite_focus=true#step:8:222)replication
andslow
tests somehow1582-09-30T12:44:08+12:13:48:00
parsing failure