Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve bors merge reliability: Windows smoke tests are advisory and run with reduced parallelism #11471

Merged
merged 3 commits into from Dec 8, 2022

Conversation

AaronFriel
Copy link
Member

The continue-on-error flag is set to true whenever the matrix platform contains "windows" for smoke tests. This does mean these jobs will allow failures and proceed, however this should substantially reduce the false negatives due to the behavior of the CI platform. Per the response from GitHub support ticket on my account:

I've confirmed that our engineering team has an open backlog item focused around "protecting" the provisioning service in cases of resource starvation. I believe this will address your exact concerns so that high resource consumption does not result in an abandoned job.

In the meantime, there unfortunately is not much capability exposed for affecting priority on the VM. Until improvements are implemented, our team still recommends trying to reduce load on the CPU at the point(s) in the workflow where the job consistently is getting abandoned.

I hope that setting this option, reducing Windows test parallelism, and setting timeouts appropriately will help improve CI reliability.

Our CI jobs typically complete in 35 minutes, any longer indicates high likelihood that the runner has failed.

@Frassle
Copy link
Member

Frassle commented Nov 26, 2022

There's literally zero value in running advisory tests in an automatic system unless we're going to write another system to collect and analyse the reasons for failures post-merge.
I don't think we're going to do that so suggestion: remove windows tests from bors completely, keep them for PR smoke tests (because humans will look at those and can hit retry on the individual jobs)

@AaronFriel
Copy link
Member Author

PR tests are purely advisory as well, it means there is no constraint on them being merged.

It seems we're back to square one here: running Windows tests on a cron.

@Frassle
Copy link
Member

Frassle commented Nov 26, 2022

PR tests are purely advisory as well

Yes but a human has a chance to see those. If I'm looking at my PR and see a red windows I should check it out. I'll admit given the high noise due to job failures it's likely people won't be very diligent about it, but there's at least some chance. Nobody but nobody is going to check bors jobs after every merge to see if the Windows test was red or not.

@AaronFriel
Copy link
Member Author

I'd rather they run on every merge just so there is data if we need to do a regression. It may give us a data point to look at more quickly than a bisect. Odds are that it won't, but I wonder what's the harm in it.

There is a third option, @danielrbradley has been looking at using larger runners in another repo. It'll raise our CI costs above zero, but perhaps it's worth combining this change with one that runs the windows jobs on a larger runner, to see if that's a viable option?

We can then revisit the staging jobs in a week and see if a larger runner is worth paying for the rest of the year.

@danielrbradley
Copy link
Member

@AaronFriel testing the new runner for the next AWS Java release FYI: pulumi/pulumi-aws#2232

@AaronFriel AaronFriel added the ci/test Test CI pipelines on this PR label Dec 4, 2022
@AaronFriel AaronFriel added the impact/no-changelog-required This issue doesn't require a CHANGELOG update label Dec 4, 2022
@pulumi-bot
Copy link
Contributor

pulumi-bot commented Dec 4, 2022

Changelog

[uncommitted] (2022-12-05)

bors bot added a commit that referenced this pull request Dec 6, 2022
11533: Allow opting out of PULUMI_OPTIMIZED_CHECKPOINT_PATCH r=t0yv0 a=t0yv0

<!--- 
Thanks so much for your contribution! If this is your first time contributing, please ensure that you have read the [CONTRIBUTING](https://github.com/pulumi/pulumi/blob/master/CONTRIBUTING.md) documentation.
-->

# Description

Allow users to opt out of deltaCheckpointUpdates even if the backend indicates it should be used. This remains necessary while PULUMI_OPTIMIZED_CHECKPOINT_PATCH has higher memory requirements on the client and may cause out-of-memory issues in constrained environments.

<!--- Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. -->

Part of https://github.com/pulumi/home/issues/2426

## Checklist

<!--- Please provide details if the checkbox below is to be left unchecked. -->
- [ ] I have added tests that prove my fix is effective or that my feature works
<!--- 
User-facing changes require a CHANGELOG entry.
-->
- [ ] I have run `make changelog` and committed the `changelog/pending/<file>` documenting my change
<!--
If the change(s) in this PR is a modification of an existing call to the Pulumi Service,
then the service should honor older versions of the CLI where this change would not exist.
You must then bump the API version in /pkg/backend/httpstate/client/api.go, as well as add
it to the service.
-->
- [ ] Yes, there are changes in this PR that warrants bumping the Pulumi Service API version
  <!-- `@Pulumi` employees: If yes, you must submit corresponding changes in the service repo. -->


11537: Fix panic in mapper support for map[string]*string r=t0yv0 a=t0yv0

<!--- 
Thanks so much for your contribution! If this is your first time contributing, please ensure that you have read the [CONTRIBUTING](https://github.com/pulumi/pulumi/blob/master/CONTRIBUTING.md) documentation.
-->

# Description

`map[string]*string` does not turn around through mapper, causing a panic.

```go
go test -run TestReproduceMapStringPointerTurnaroundIssue                                                                                                  ~/code/pulumi/sdk/go/common/util/mapper
--- FAIL: TestReproduceMapStringPointerTurnaroundIssue (0.00s)
    mapper_test.go:608: encodedMap: map[args:map[key:value]]
panic: reflect: Elem of invalid type string [recovered]
        panic: reflect: Elem of invalid type string

goroutine 20 [running]:
testing.tRunner.func1.2({0x11cb480, 0xc0000992b0})
        /nix/store/0c30lcag5r6ahw3qj0x7lkshpry4yqwl-go-1.19/share/go/src/testing/testing.go:1396 +0x24e
testing.tRunner.func1()
        /nix/store/0c30lcag5r6ahw3qj0x7lkshpry4yqwl-go-1.19/share/go/src/testing/testing.go:1399 +0x39f
panic({0x11cb480, 0xc0000992b0})
        /nix/store/0c30lcag5r6ahw3qj0x7lkshpry4yqwl-go-1.19/share/go/src/runtime/panic.go:884 +0x212
reflect.(*rtype).Elem(0x11d1500?)
        /nix/store/0c30lcag5r6ahw3qj0x7lkshpry4yqwl-go-1.19/share/go/src/reflect/type.go:972 +0x134
github.com/pulumi/pulumi/sdk/v3/go/common/util/mapper.(*mapper).adjustValueForAssignment(0xc0000a2240, {0x11d1500?, 0xc000099290?, 0x98?}, {0x126b940?, 0x11c6160}, {0x126b940?, 0x11dd060}, {0xc0000a4a80, 0xf})
        /Users/t0yv0/code/pulumi/sdk/go/common/util/mapper/mapper_decode.go:158 +0x2ea
github.com/pulumi/pulumi/sdk/v3/go/common/util/mapper.(*mapper).adjustValueForAssignment(0xc0000a2240, {0x11d5c20?, 0xc00009d470?, 0x11de1e0?}, {0x126b940?, 0x11d5680}, {0x126b940?, 0x11dd060}, {0x11c0b24, 0x4})
        /Users/t0yv0/code/pulumi/sdk/go/common/util/mapper/mapper_decode.go:204 +0x1a15
github.com/pulumi/pulumi/sdk/v3/go/common/util/mapper.(*mapper).DecodeValue(0xc0000a2240, 0xc0000a0060?, {0x126b940?, 0x11dd060}, {0x11c0b24, 0x4}, {0xc0000aac80?, 0xc0000a0060?}, 0x1)
        /Users/t0yv0/code/pulumi/sdk/go/common/util/mapper/mapper_decode.go:117 +0x8ee
github.com/pulumi/pulumi/sdk/v3/go/common/util/mapper.(*mapper).Decode(0xc0000a2240, 0xc0000a8380?, {0x11c36a0?, 0xc0000a0060?})
        /Users/t0yv0/code/pulumi/sdk/go/common/util/mapper/mapper_decode.go:55 +0x8db
github.com/pulumi/pulumi/sdk/v3/go/common/util/mapper.TestReproduceMapStringPointerTurnaroundIssue.func2(0xc000083520?)
        /Users/t0yv0/code/pulumi/sdk/go/common/util/mapper/mapper_test.go:592 +0xe9
github.com/pulumi/pulumi/sdk/v3/go/common/util/mapper.TestReproduceMapStringPointerTurnaroundIssue(0xc000083520)
        /Users/t0yv0/code/pulumi/sdk/go/common/util/mapper/mapper_test.go:610 +0x11f
testing.tRunner(0xc000083520, 0x1231f88)
        /nix/store/0c30lcag5r6ahw3qj0x7lkshpry4yqwl-go-1.19/share/go/src/testing/testing.go:1446 +0x10b
created by testing.(*T).Run
        /nix/store/0c30lcag5r6ahw3qj0x7lkshpry4yqwl-go-1.19/share/go/src/testing/testing.go:1493 +0x35f
exit status 2
FAIL    github.com/pulumi/pulumi/sdk/v3/go/common/util/mapper   0.674s
```


<!--- Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. -->

Fixes # (issue)

## Checklist

<!--- Please provide details if the checkbox below is to be left unchecked. -->
- [ ] I have added tests that prove my fix is effective or that my feature works
<!--- 
User-facing changes require a CHANGELOG entry.
-->
- [ ] I have run `make changelog` and committed the `changelog/pending/<file>` documenting my change
<!--
If the change(s) in this PR is a modification of an existing call to the Pulumi Service,
then the service should honor older versions of the CLI where this change would not exist.
You must then bump the API version in /pkg/backend/httpstate/client/api.go, as well as add
it to the service.
-->
- [ ] Yes, there are changes in this PR that warrants bumping the Pulumi Service API version
  <!-- `@Pulumi` employees: If yes, you must submit corresponding changes in the service repo. -->


11539: Add `*[TYPE]` to `[TYPE]Ptr` methods r=aq17 a=aq17

Fixes #11536
Adds methods to convert pointer types to corresponding pulumi Ptr types

11540: ci: Remove page file tweak for Windows r=aq17 a=AaronFriel

The step to [configure the pagefile on Windows](https://github.com/pulumi/pulumi/actions/runs/3624853251/jobs/6112393713) began failing due to an update to the underlying runner image. Given that this is blocking CI, we'll remove the step for now.

Either #11532 or #11471 should address performance of integration tests and reliability and obviate the need for this step.

Co-authored-by: Anton Tayanovskyy <anton@pulumi.com>
Co-authored-by: aq17 <aqiu@pulumi.com>
Co-authored-by: Aaron Friel <mayreply@aaronfriel.com>
bors bot added a commit that referenced this pull request Dec 6, 2022
11539: Add `*[TYPE]` to `[TYPE]Ptr` methods r=aq17 a=aq17

Fixes #11536
Adds methods to convert pointer types to corresponding pulumi Ptr types

11540: ci: Remove page file tweak for Windows r=aq17 a=AaronFriel

The step to [configure the pagefile on Windows](https://github.com/pulumi/pulumi/actions/runs/3624853251/jobs/6112393713) began failing due to an update to the underlying runner image. Given that this is blocking CI, we'll remove the step for now.

Either #11532 or #11471 should address performance of integration tests and reliability and obviate the need for this step.

Co-authored-by: aq17 <aqiu@pulumi.com>
Co-authored-by: Aaron Friel <mayreply@aaronfriel.com>
Copy link
Member

@Frassle Frassle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all very lame but seems better than doing nothing right now

@AaronFriel
Copy link
Member Author

bors merge

@bors
Copy link
Contributor

bors bot commented Dec 8, 2022

Build succeeded:

@bors bors bot merged commit f43f549 into master Dec 8, 2022
@bors bors bot deleted the friel/allow-win-fail branch December 8, 2022 20:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/test Test CI pipelines on this PR impact/no-changelog-required This issue doesn't require a CHANGELOG update
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants