Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

profiler: fix TestStopLatency (take 2) #1307

Merged
merged 2 commits into from
May 31, 2022

Conversation

nsrip-dd
Copy link
Contributor

PR #1297 attempted to fix the flakiness noted in issue #1294 by creating
two seperate tests: one which runs a long profile to test the latency of
stopping the profile, and another which runs short profiles and makes
uploading hang indefinitely. However, the upload test had such a short
profiling period that the inner select statement in (*profiler).collect
could take several iterations to actually cancel due to the Go runtime
randomly choosing select cases when multiple cases are ready.

In addition, the "stop-profiler" case didn't actually test what it was
intended to test since profiling doesn't actually run until one full
profiling period has elapsed. Since the period was set to an hour, Stop
was called withouth profiling actually started.

Merge the two tests back into one. This brings us full-circle back to
the original test, but with a more generous window on how long stopping
should take and without relying on modifying internal functions to make
the test work.

PR #1297 attempted to fix the flakiness noted in issue #1294 by creating
two seperate tests: one which runs a long profile to test the latency of
stopping the profile, and another which runs short profiles and makes
uploading hang indefinitely. However, the upload test had such a short
profiling period that the inner select statement in (*profiler).collect
could take several iterations to actually cancel due to the Go runtime
randomly choosing select cases when multiple cases are ready.

In addition, the "stop-profiler" case didn't actually test what it was
intended to test since profiling doesn't actually run until one full
profiling period has elapsed. Since the period was set to an hour, Stop
was called withouth profiling actually started.

Merge the two tests back into one. This brings us full-circle back to
the original test, but with a more generous window on how long stopping
should take and without relying on modifying internal functions to make
the test work.
@nsrip-dd nsrip-dd requested a review from a team as a code owner May 25, 2022 19:00
@nsrip-dd nsrip-dd added this to the 1.39.0 milestone May 25, 2022
@@ -88,6 +88,9 @@ func (p *profiler) doRequest(bat batch) error {
cancel()
}()
req, err := http.NewRequestWithContext(ctx, "POST", p.cfg.targetURL, body)
if err != nil {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an unrelated change, but I noticed there was no error check while I was investigating whether stopping the upload was actually the inconsistent part leading to test flakes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Seems like we missed this small regression in #1239 (cc @ajgajg1134). But I think this method will only return an error if we pass in invalid arguments, so it's probably not been having any impact.

Copy link
Member

@felixge felixge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, see comments.

// serialization.
if elapsed > 500*time.Millisecond {
t.Errorf("profiler took %v to stop", elapsed)
received := make(chan struct{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: Maybe give this channel a capacity of 1 to avoid the very unlikely race of <-received on line 206 getting stuck because the http handler fired much earlier than expected. (Again: Extremely unlikely)

Copy link
Contributor Author

@nsrip-dd nsrip-dd May 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's safe to have the channel unbuffered. The handler should be called repeatedly, once per profiling period. If the handler somehow fired before the main goroutine got to line 206, it would just fire again the next profiling period.

EDIT: also, the handler does a non-blocking send to received so it's fine if the handler is called multiple times.

@@ -88,6 +88,9 @@ func (p *profiler) doRequest(bat batch) error {
cancel()
}()
req, err := http.NewRequestWithContext(ctx, "POST", p.cfg.targetURL, body)
if err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Seems like we missed this small regression in #1239 (cc @ajgajg1134). But I think this method will only return an error if we pass in invalid arguments, so it's probably not been having any impact.

@nsrip-dd nsrip-dd merged commit 3528a0d into main May 31, 2022
@nsrip-dd nsrip-dd deleted the nick.ripley/really-fix-teststoplatency branch May 31, 2022 18:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants