profiler: fix TestStopLatency (take 2) #1307

nsrip-dd · 2022-05-25T19:00:45Z

PR #1297 attempted to fix the flakiness noted in issue #1294 by creating
two seperate tests: one which runs a long profile to test the latency of
stopping the profile, and another which runs short profiles and makes
uploading hang indefinitely. However, the upload test had such a short
profiling period that the inner select statement in (*profiler).collect
could take several iterations to actually cancel due to the Go runtime
randomly choosing select cases when multiple cases are ready.

In addition, the "stop-profiler" case didn't actually test what it was
intended to test since profiling doesn't actually run until one full
profiling period has elapsed. Since the period was set to an hour, Stop
was called withouth profiling actually started.

Merge the two tests back into one. This brings us full-circle back to
the original test, but with a more generous window on how long stopping
should take and without relying on modifying internal functions to make
the test work.

PR #1297 attempted to fix the flakiness noted in issue #1294 by creating two seperate tests: one which runs a long profile to test the latency of stopping the profile, and another which runs short profiles and makes uploading hang indefinitely. However, the upload test had such a short profiling period that the inner select statement in (*profiler).collect could take several iterations to actually cancel due to the Go runtime randomly choosing select cases when multiple cases are ready. In addition, the "stop-profiler" case didn't actually test what it was intended to test since profiling doesn't actually run until one full profiling period has elapsed. Since the period was set to an hour, Stop was called withouth profiling actually started. Merge the two tests back into one. This brings us full-circle back to the original test, but with a more generous window on how long stopping should take and without relying on modifying internal functions to make the test work.

nsrip-dd · 2022-05-25T19:02:10Z

profiler/upload.go

@@ -88,6 +88,9 @@ func (p *profiler) doRequest(bat batch) error {
 		cancel()
 	}()
 	req, err := http.NewRequestWithContext(ctx, "POST", p.cfg.targetURL, body)
+	if err != nil {


This is an unrelated change, but I noticed there was no error check while I was investigating whether stopping the upload was actually the inconsistent part leading to test flakes.

👍 Seems like we missed this small regression in #1239 (cc @ajgajg1134). But I think this method will only return an error if we pass in invalid arguments, so it's probably not been having any impact.

felixge

LGTM, see comments.

felixge · 2022-05-30T08:47:08Z

profiler/profiler_test.go

-		// serialization.
-		if elapsed > 500*time.Millisecond {
-			t.Errorf("profiler took %v to stop", elapsed)
+	received := make(chan struct{})


NIT: Maybe give this channel a capacity of 1 to avoid the very unlikely race of <-received on line 206 getting stuck because the http handler fired much earlier than expected. (Again: Extremely unlikely)

I think it's safe to have the channel unbuffered. The handler should be called repeatedly, once per profiling period. If the handler somehow fired before the main goroutine got to line 206, it would just fire again the next profiling period.

EDIT: also, the handler does a non-blocking send to received so it's fine if the handler is called multiple times.

felixge · 2022-05-30T08:52:53Z

profiler/upload.go

@@ -88,6 +88,9 @@ func (p *profiler) doRequest(bat batch) error {
 		cancel()
 	}()
 	req, err := http.NewRequestWithContext(ctx, "POST", p.cfg.targetURL, body)
+	if err != nil {


👍 Seems like we missed this small regression in #1239 (cc @ajgajg1134). But I think this method will only return an error if we pass in invalid arguments, so it's probably not been having any impact.

nsrip-dd requested a review from a team as a code owner May 25, 2022 19:00

nsrip-dd added this to the 1.39.0 milestone May 25, 2022

nsrip-dd commented May 25, 2022

View reviewed changes

felixge approved these changes May 30, 2022

View reviewed changes

Merge branch 'main' into nick.ripley/really-fix-teststoplatency

bce7d83

nsrip-dd merged commit 3528a0d into main May 31, 2022

nsrip-dd deleted the nick.ripley/really-fix-teststoplatency branch May 31, 2022 18:37

felixge mentioned this pull request Jun 1, 2022

profiler: TestStopLatency fails on nightly race detector tests #1294

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

profiler: fix TestStopLatency (take 2) #1307

profiler: fix TestStopLatency (take 2) #1307

nsrip-dd commented May 25, 2022

nsrip-dd May 25, 2022

felixge May 30, 2022

felixge left a comment

felixge May 30, 2022

nsrip-dd May 31, 2022 •

edited

felixge May 30, 2022

profiler: fix TestStopLatency (take 2) #1307

profiler: fix TestStopLatency (take 2) #1307

Conversation

nsrip-dd commented May 25, 2022

nsrip-dd May 25, 2022

Choose a reason for hiding this comment

felixge May 30, 2022

Choose a reason for hiding this comment

felixge left a comment

Choose a reason for hiding this comment

felixge May 30, 2022

Choose a reason for hiding this comment

nsrip-dd May 31, 2022 • edited

Choose a reason for hiding this comment

felixge May 30, 2022

Choose a reason for hiding this comment

nsrip-dd May 31, 2022 •

edited