Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xdsclient: deflake Test/LDSWatch_PartialValid #5552

Merged
merged 1 commit into from Aug 3, 2022

Conversation

easwars
Copy link
Contributor

@easwars easwars commented Aug 2, 2022

Fixes #5546

I've seen a couple of flakes in this test:

  1. The go-control-plane server continuously resends the same resources if the client NACKs it. This means that our tests which verify some sort of NACKing behavior will continuously receive the same resources again and again. This means that the watch callback will be called an arbitrary number of times. This causes testutils.Channel.Send() to block because there is no buffer on the channel. Allocating a buffer for this channel is not an option since we do not know how big a buffer we would need. Instead using SendContext() would ensure that the call will return when the test's context is cancelled.
  2. envconfig.XDSFederation is to true at the top of every test and a function is deferred to set it back to its original value. The xdsclient also reads this var at multiple places, and if it so happens that the xdsclient receives a resource at the time the test is cleaning up, we end up with a data race. I don't know what would be the best way to fix this (or if there is any easy way out at all). In this PR, I'm switching to using t.Cleanup() instead of defer to see if it helps in anyway.

RELEASE NOTES: none

@easwars easwars added this to the 1.49 Release milestone Aug 2, 2022
@easwars easwars requested a review from zasweq August 2, 2022 16:10
Copy link
Contributor

@zasweq zasweq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

oldFed := envconfig.XDSFederation
envconfig.XDSFederation = true
return func() { envconfig.XDSFederation = oldFed }
t.Cleanup(func() { envconfig.XDSFederation = oldFed })
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this will help, but hopefully it will and no harm done. The documentation says that "Cleanup registers a function to be called when the test and all its subtests complete." Hopefully there is some guarantee there that the xdsclient will be destructed at that point, but feels like it's the same as what we have currently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more out of hope than anything else. The other issue that I have mentioned in the description is a real flake and the fix in this PR definitely addresses that. Thanks.

@zasweq zasweq assigned easwars and unassigned zasweq Aug 2, 2022
@easwars easwars merged commit b89f49b into grpc:master Aug 3, 2022
@easwars easwars deleted the LDSWatch_PartialValid branch August 3, 2022 15:52
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 31, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flaky Test: Test/LDSWatch_PartialValid
2 participants