-
Notifications
You must be signed in to change notification settings - Fork 416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
profiler: fix TestStopLatency (take 2) #1307
Conversation
PR #1297 attempted to fix the flakiness noted in issue #1294 by creating two seperate tests: one which runs a long profile to test the latency of stopping the profile, and another which runs short profiles and makes uploading hang indefinitely. However, the upload test had such a short profiling period that the inner select statement in (*profiler).collect could take several iterations to actually cancel due to the Go runtime randomly choosing select cases when multiple cases are ready. In addition, the "stop-profiler" case didn't actually test what it was intended to test since profiling doesn't actually run until one full profiling period has elapsed. Since the period was set to an hour, Stop was called withouth profiling actually started. Merge the two tests back into one. This brings us full-circle back to the original test, but with a more generous window on how long stopping should take and without relying on modifying internal functions to make the test work.
@@ -88,6 +88,9 @@ func (p *profiler) doRequest(bat batch) error { | |||
cancel() | |||
}() | |||
req, err := http.NewRequestWithContext(ctx, "POST", p.cfg.targetURL, body) | |||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an unrelated change, but I noticed there was no error check while I was investigating whether stopping the upload was actually the inconsistent part leading to test flakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Seems like we missed this small regression in #1239 (cc @ajgajg1134). But I think this method will only return an error if we pass in invalid arguments, so it's probably not been having any impact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, see comments.
// serialization. | ||
if elapsed > 500*time.Millisecond { | ||
t.Errorf("profiler took %v to stop", elapsed) | ||
received := make(chan struct{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: Maybe give this channel a capacity of 1 to avoid the very unlikely race of <-received
on line 206 getting stuck because the http handler fired much earlier than expected. (Again: Extremely unlikely)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's safe to have the channel unbuffered. The handler should be called repeatedly, once per profiling period. If the handler somehow fired before the main goroutine got to line 206, it would just fire again the next profiling period.
EDIT: also, the handler does a non-blocking send to received
so it's fine if the handler is called multiple times.
@@ -88,6 +88,9 @@ func (p *profiler) doRequest(bat batch) error { | |||
cancel() | |||
}() | |||
req, err := http.NewRequestWithContext(ctx, "POST", p.cfg.targetURL, body) | |||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Seems like we missed this small regression in #1239 (cc @ajgajg1134). But I think this method will only return an error if we pass in invalid arguments, so it's probably not been having any impact.
PR #1297 attempted to fix the flakiness noted in issue #1294 by creating
two seperate tests: one which runs a long profile to test the latency of
stopping the profile, and another which runs short profiles and makes
uploading hang indefinitely. However, the upload test had such a short
profiling period that the inner select statement in (*profiler).collect
could take several iterations to actually cancel due to the Go runtime
randomly choosing select cases when multiple cases are ready.
In addition, the "stop-profiler" case didn't actually test what it was
intended to test since profiling doesn't actually run until one full
profiling period has elapsed. Since the period was set to an hour, Stop
was called withouth profiling actually started.
Merge the two tests back into one. This brings us full-circle back to
the original test, but with a more generous window on how long stopping
should take and without relying on modifying internal functions to make
the test work.