Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable concurrency in zstd and add Benchmark tests for it #9749

Merged
merged 1 commit into from Apr 12, 2024

Conversation

rnishtala-sumo
Copy link
Contributor

@rnishtala-sumo rnishtala-sumo commented Mar 12, 2024

Description: zstd benchmark tests added
The goal of this PR is to disable concurrency in zstd compression to reduce its memory footprint and avoid a known issue with goroutine leaks. Please see - klauspost/compress#264

Link to tracking Issue: #8216

Testing: Benchmark test results below

BenchmarkCompression/zstdWithConcurrency/compress-10         	   21392	     55855 ns/op	187732.88 MB/s	 2329164 B/op	      28 allocs/op
BenchmarkCompression/zstdNoConcurrency/compress-10           	   29526	     39902 ns/op	262787.42 MB/s	 1758988 B/op	      15 allocs/op
input => 10.00 MB

@rnishtala-sumo rnishtala-sumo changed the title Adding benchmark tests for zstd Disable concurrency in zstd and add Benchmark tests for it Mar 12, 2024
Copy link

codecov bot commented Mar 12, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.02%. Comparing base (b34f535) to head (29e2e37).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #9749   +/-   ##
=======================================
  Coverage   91.02%   91.02%           
=======================================
  Files         353      353           
  Lines       18704    18704           
=======================================
  Hits        17026    17026           
  Misses       1350     1350           
  Partials      328      328           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@swiatekm-sumo
Copy link
Contributor

Which of these tests is the one without concurrency? BenchmarkCompression/zstd#01/compress-10?

@rnishtala-sumo
Copy link
Contributor Author

yes the second one, I disabled it using zstd.WithEncoderConcurrency(1). I didn't use sync.pool here because it then shows 0 memory allocations as below, I'm not sure of the reason yet.

BenchmarkCompression/zstd/compress-10         	23839293	        51.14 ns/op	205048310.34 MB/s	       0 B/op	       0 allocs/op

The benchmark test using a syncpool looks like below

		compressor, err := newCompressor(codec) // initializes a new compressor in the syncpool
		if err != nil {
			b.Fatal(err)
		}
		b.ResetTimer()
		b.ReportAllocs()
		b.SetBytes(int64(len(payload)))
		for i := 0; i < b.N; i++ {
			compressor.compress(buf, stringReadCloser) // Gets the compressor from the syncpool
			/*enc, _ = zstd.NewWriter(nil, zstd.WithEncoderConcurrency(5))
			enc.(writeCloserReset).Reset(buf)
			io.Copy(enc, stringReadCloser)*/

I also tried initializing the compressor in the for loop and still get 0 memory allocations.

@rnishtala-sumo
Copy link
Contributor Author

rnishtala-sumo commented Mar 13, 2024

Also, @swiatekm-sumo @atoulme we already seem to be disabling concurrency while uncompressing payloads, as this comment mentions
https://github.com/open-telemetry/opentelemetry-collector/blob/main/config/confighttp/compression.go#L103

@rnishtala-sumo rnishtala-sumo force-pushed the bnchmk-zstd branch 4 times, most recently from 3db5d26 to 962db5e Compare March 13, 2024 19:42
@rnishtala-sumo rnishtala-sumo marked this pull request as ready for review March 13, 2024 19:42
@rnishtala-sumo rnishtala-sumo requested a review from a team as a code owner March 13, 2024 19:42
@rnishtala-sumo rnishtala-sumo force-pushed the bnchmk-zstd branch 2 times, most recently from 711e947 to 494d1b1 Compare March 13, 2024 20:09
Copy link
Contributor

@codeboten codeboten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance you were able to benchmark the performance of components using this new setting? It would be great to see a comparison

@@ -28,7 +28,7 @@ var (
_ writeCloserReset = (*snappy.Writer)(nil)
snappyPool = &compressor{pool: sync.Pool{New: func() any { return snappy.NewBufferedWriter(nil) }}}
_ writeCloserReset = (*zstd.Encoder)(nil)
zStdPool = &compressor{pool: sync.Pool{New: func() any { zw, _ := zstd.NewWriter(nil); return zw }}}
zStdPool = &compressor{pool: sync.Pool{New: func() any { zw, _ := zstd.NewWriter(nil, zstd.WithEncoderConcurrency(1)); return zw }}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rnishtala-sumo, can you add the same comment re. the setting of concurrency to 1 here for future readers of the code :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment above this change, thanks!

@rnishtala-sumo
Copy link
Contributor Author

rnishtala-sumo commented Mar 18, 2024

@codeboten This test from the PR benchmarks disabling concurrency in zstd
https://github.com/open-telemetry/opentelemetry-collector/pull/9749/files#diff-f88b3f9180e1086acdd9ed98115611fc6be3ce4ebfcd5199b716ae56f6ca0573R88

is this what you were looking for, or did you want to see a benchmark for an exporter that uses zstd compression?

@rnishtala-sumo rnishtala-sumo force-pushed the bnchmk-zstd branch 2 times, most recently from 44ef00b to 042febe Compare March 18, 2024 14:25
@rnishtala-sumo
Copy link
Contributor Author

rnishtala-sumo commented Apr 2, 2024

@dmitryax @swiatekm-sumo @codeboten here's some results from running tests with multiple workers using the testbed in the contrib repo

Options used for the test

options := testbed.LoadOptions{
		DataItemsPerSecond: 10_000,
		ItemsPerBatch:      1500,
		Parallel:           2,
	}

WIth zstd concurrency

=== RUN   TestLog10kDPS/OTLP-HTTP-zstd
2024/04/01 17:01:30 Starting mock backend...
2024/04/01 17:01:30 Starting Agent (/Users/rnishtala/src/opentelemetry-collector-contrib/bin/oteltestbedcol_darwin_arm64)
2024/04/01 17:01:30 Writing Agent log to /Users/rnishtala/src/opentelemetry-collector-contrib/testbed/tests/results/TestLog10kDPS/OTLP-HTTP-zstd/agent.log
2024/04/01 17:01:30 Agent running, pid=44650
2024/04/01 17:01:31 Starting load generator at 10000 items/sec.
2024/04/01 17:01:33 Agent RAM (RES):   0 MiB, CPU: 0.0% | Sent:    16,000 logs (9,420/sec) | Received:    16,000 items (5,325/sec)
2024/04/01 17:01:36 Agent RAM (RES): 119 MiB, CPU: 7.5% | Sent:    46,000 logs (9,795/sec) | Received:    46,000 items (7,663/sec)
2024/04/01 17:01:39 Agent RAM (RES): 120 MiB, CPU:10.7% | Sent:    76,000 logs (9,870/sec) | Received:    74,000 items (8,216/sec)
2024/04/01 17:01:42 Agent RAM (RES): 120 MiB, CPU:10.7% | Sent:   106,000 logs (9,909/sec) | Received:   104,000 items (8,664/sec)
2024/04/01 17:01:45 Agent RAM (RES): 120 MiB, CPU:10.6% | Sent:   136,000 logs (9,929/sec) | Received:   134,000 items (8,931/sec)

Without zstd concurrency

=== RUN   TestLog10kDPS/OTLP-HTTP-zstd
2024/04/02 11:02:04 Starting mock backend...
2024/04/02 11:02:04 Starting Agent (/Users/rnishtala/src/opentelemetry-collector-contrib/bin/oteltestbedcol_darwin_arm64)
2024/04/02 11:02:04 Writing Agent log to /Users/rnishtala/src/opentelemetry-collector-contrib/testbed/tests/results/TestLog10kDPS/OTLP-HTTP-zstd/agent.log
2024/04/02 11:02:04 Agent running, pid=13682
2024/04/02 11:02:06 Starting load generator at 10000 items/sec.
2024/04/02 11:02:07 Agent RAM (RES):   0 MiB, CPU: 0.0% | Sent:    15,000 logs (8,863/sec) | Received:    12,000 items (3,998/sec)
2024/04/02 11:02:10 Agent RAM (RES): 107 MiB, CPU: 4.7% | Sent:    42,000 logs (8,949/sec) | Received:    39,000 items (6,498/sec)
2024/04/02 11:02:13 Agent RAM (RES): 108 MiB, CPU: 6.4% | Sent:    69,000 logs (8,969/sec) | Received:    66,000 items (7,332/sec)
2024/04/02 11:02:16 Agent RAM (RES): 109 MiB, CPU: 6.1% | Sent:    96,000 logs (8,978/sec) | Received:    94,500 items (7,874/sec)
2024/04/02 11:02:19 Agent RAM (RES): 109 MiB, CPU: 5.8% | Sent:   123,000 logs (8,982/sec) | Received:   120,000 items (7,999/sec)
2024/04/02 11:02:21 Stopped generator. Sent:   138,000 logs (8,823/sec)

The above results show some improvement in both memory and CPU with zstd concurrency disabled.

As we increase the number of workers (4), the memory usage goes up (even with zstd concurrency disabled), but the CPU performance if better.

options := testbed.LoadOptions{
		DataItemsPerSecond: 10_000,
		ItemsPerBatch:      1500,
		Parallel:           4,
	}
=== RUN   TestLog10kDPS/OTLP-HTTP-zstd
2024/04/02 11:08:37 Starting mock backend...
2024/04/02 11:08:37 Starting Agent (/Users/rnishtala/src/opentelemetry-collector-contrib/bin/oteltestbedcol_darwin_arm64)
2024/04/02 11:08:37 Writing Agent log to /Users/rnishtala/src/opentelemetry-collector-contrib/testbed/tests/results/TestLog10kDPS/OTLP-HTTP-zstd/agent.log
2024/04/02 11:08:37 Agent running, pid=31860
2024/04/02 11:08:38 Starting load generator at 10000 items/sec.
2024/04/02 11:08:40 Agent RAM (RES):   0 MiB, CPU: 0.0% | Sent:     6,000 logs (3,530/sec) | Received:     6,000 items (1,999/sec)
2024/04/02 11:08:43 Agent RAM (RES):  69 MiB, CPU: 2.8% | Sent:    24,000 logs (5,107/sec) | Received:    24,000 items (3,999/sec)
2024/04/02 11:08:46 Agent RAM (RES): 117 MiB, CPU: 3.2% | Sent:    42,000 logs (5,453/sec) | Received:    42,000 items (4,665/sec)
2024/04/02 11:08:49 Agent RAM (RES): 135 MiB, CPU: 3.0% | Sent:    60,000 logs (5,607/sec) | Received:    60,000 items (4,999/sec)
2024/04/02 11:08:52 Agent RAM (RES): 136 MiB, CPU: 3.0% | Sent:    78,000 logs (5,693/sec) | Received:    78,000 items (5,199/sec)
2024/04/02 11:08:54 Stopped generator. Sent:    96,000 logs (5,894/sec)

Overall disabling zstd concurrency does reduce the memory footprint and helps us the avoid goroutine leaks stated in this issue - klauspost/compress#264

@@ -0,0 +1,96 @@
// Copyright The OpenTelemetry Authors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it make sense to remove this test since its only benchmarking underlying components and not options in the collector itself?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only reason I think we may want to keep this, is because it essentially benchmarks the configcompression.Type option used in the collector (zstd in the PR). Especially since we know that there's an issue with enabling concurrency in zstd. Having said that I'm not against removing this test. Also would like you're opinion on this comment

@rnishtala-sumo
Copy link
Contributor Author

rnishtala-sumo commented Apr 9, 2024

@codeboten @dmitryax please let me know if you'd like to see additional changes to this PR, or more load tests.

@dmitryax dmitryax merged commit 7a8954f into open-telemetry:main Apr 12, 2024
47 checks passed
@github-actions github-actions bot added this to the next release milestone Apr 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants