moved average benchmark results to p90 results #2152

stefanosiano · 2022-07-04T16:38:59Z

📜 Description

Moved benchmark results from average to p90 results
Increased the measured iterations from 15 to 20 by default
Profiling benchmark now prints raw values to the console, to later read them from the log file
Removed testOrchestrator option from saucelabs config and gradle
Benchmarks in SauceLabs will now run on 2 devices with Android 12, 3 with Android 11, 2 with Android 10
We will just assert on the cpu overhead, not on duration or dropped frames anymore, even if we are printing them
Added a test to send profiles to a Sentry project (dogfooding test)

💡 Motivation and Context

To get better results from the benchmark, the p90 is a better metric than the average.
We decided to move to 20 measured iteration to get reliable results.
Since we want benchmark numbers to show on docs, we are going to print them to the log file. This is a quick solution to allow us to download the log, parse it, and get all the raw numbers we want to feed our systems. Later on, we will stop printing these values, and will send them to an endpoint.
I realized the tests were not working on SauceLabs (even if they were suceeding) due to an issue with androidx test orchestrator. Removing it fixed the problem.
The devices on SauceLbas to run the benchmarks on have been updated. Now we test devices for Android 10, 11 and 12. Doing so, we cover more than 70% of currently used devices, based on https://gs.statcounter.com/os-version-market-share/android/mobile-tablet/worldwide.
Duration changes can get pretty unpredictable, due to system things, like thermal throttling, power saving, etc. Cpu overhead is, instead, pretty consistent across all devices, and that's the most important thing at the moment.
The backend needs some instrumented profiles

#skip-changelog

💚 How did you test it?

I ran the saucelabs commands manually, and the tests ran perfectly fine. 🥳

📝 Checklist

I reviewed the submitted code
I added tests to verify the changes
I updated the docs if needed
No breaking changes

🔮 Next steps

We will send the benchmark data to a specific endpoint, but this will be done in another pr later.
I just need to check the log files and if we can get all the values just fine.

increased the measured iterations from 15 to 20 by default removed testOrchestrator option from saucelabs config and gradle profiling benchmark now prints raw values to the console, to later read them from the log file Benchmarks in SauceLabs will now run on 2 devices with Andorid 12, 3 with Android 11, 2 with Android 10

philipphofmann

Gave this a quick pass. What's missing to finish this PR, @stefanosiano?

.sauce/sentry-uitest-android-benchmark.yml

sentry-android-integration-tests/sentry-uitest-android-benchmark/build.gradle.kts

stefanosiano · 2022-07-05T17:07:55Z

Gave this a quick pass. What's missing to finish this PR, @stefanosiano?

Just a few changes on how to calculate the p90 values, as benchmarks are failing because of it 😅
And a change for refresh rate to avoid crashes on Android 11/12 that i didn't realize before 😅

updated the percentile calculation method warmup iterations reduced from 3 to 2 the measured iterations now run in alternated order added proguard file to fix ui-tests changed low-end Android 10 device to a less powerful device sdk init duration increase threshold increased to 250 milliseconds cpu overhead range for the same operation increased to -2%..2% even if we print all values, we just assert on the cpu overhead added a test to send profiles to a Sentry project (dogfooding test)

codecov-commenter · 2022-07-08T17:49:09Z

Codecov Report

Merging #2152 (70ed5ba) into main (fda3319) will decrease coverage by 0.01%.
The diff coverage is n/a.

@@             Coverage Diff              @@
##               main    #2152      +/-   ##
============================================
- Coverage     80.95%   80.94%   -0.02%     
- Complexity     3257     3290      +33     
============================================
  Files           231      233       +2     
  Lines         11964    12044      +80     
  Branches       1589     1594       +5     
============================================
+ Hits           9686     9749      +63     
- Misses         1698     1712      +14     
- Partials        580      583       +3

Impacted Files	Coverage Δ
...ry/src/main/java/io/sentry/TransactionContext.java	`85.71% <0.00%> (-14.29%)`	⬇️
...entry/src/main/java/io/sentry/NoOpTransaction.java	`25.00% <0.00%> (-0.81%)`	⬇️
sentry/src/main/java/io/sentry/OutboxSender.java	`65.64% <0.00%> (-0.74%)`	⬇️
sentry/src/main/java/io/sentry/SpanContext.java	`83.94% <0.00%> (-0.68%)`	⬇️
...ain/java/io/sentry/protocol/SentryTransaction.java	`88.97% <0.00%> (-0.28%)`	⬇️
sentry/src/main/java/io/sentry/TraceContext.java	`86.74% <0.00%> (-0.01%)`	⬇️
sentry/src/main/java/io/sentry/Span.java	`100.00% <0.00%> (ø)`
sentry/src/main/java/io/sentry/SentryOptions.java	`82.03% <0.00%> (ø)`
sentry/src/main/java/io/sentry/TracesSampler.java	`100.00% <0.00%> (ø)`
...y/src/main/java/io/sentry/util/SampleRateUtil.java	`88.88% <0.00%> (ø)`
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fda3319...70ed5ba. Read the comment docs.

stefanosiano · 2022-07-11T14:03:18Z

The only thing is that benchmarks take around 1 hour to run.
Are we fine with that?

philipphofmann · 2022-07-12T14:34:25Z

The only thing is that benchmarks take around 1 hour to run. Are we fine with that?

Faster would be better, of course. What's the bottleneck? How could we reduce this?

philipphofmann

LGTM, but ideally, we reduce the 1h duration.

...n-tests/sentry-uitest-android/src/androidTest/java/io/sentry/uitest/android/EnvelopeTests.kt

...ests/sentry-uitest-android/src/main/java/io/sentry/uitest/android/ProfilingSampleActivity.kt

stefanosiano · 2022-07-12T15:03:30Z

LGTM, but ideally, we reduce the 1h duration.

There is this pr, with a slack thread on that and some consideration on this notion doc

added a flag to stop fibonacci on activity pause

romtsn · 2022-07-13T11:27:48Z

.sauce/sentry-uitest-android-benchmark.yml

-      useTestOrchestrator: true
+      - id: OnePlus_9_Pro_real_us # OnePlus 9 Pro - api 30 (11) - high end
+      - id: Google_Pixel_4_real_us # Google Pixel 4 - api 30 (11) - mid end
+      - id: Google_Pixel_2_real_us # Google Pixel 2 - api 30 (11) - low end


I'm not sure if Pixel 2 is really a low-end device, but that's alright I guess

It's one of the lowest-end provided by saucelabs for android 11. The other at ~same level should be Samsung Galaxy A50, but the pixel 2 is a 5 years old device, so it's fine for benchmarks

romtsn · 2022-07-13T11:39:36Z

...rk/src/androidTest/java/io/sentry/uitest/android/benchmark/util/BenchmarkComparisonResult.kt

+    fun printAllRuns(prefix: String) {
+        repeat(iterations) { index ->
+
+            println("$prefix ==================== Iteration $index ====================")


Just for me to understand - this is gonna be printed into Logcat and later you pull this through adb probably, and parse, correct?

it's printed to console, which then it's written in the device.log from saucelabs. Then I will download the log file and parse it locally.
Later all of this will be automated when an endpoint to send data to will be available

philipphofmann reviewed Jul 5, 2022

View reviewed changes

.sauce/sentry-uitest-android-benchmark.yml Outdated Show resolved Hide resolved

sentry-android-integration-tests/sentry-uitest-android-benchmark/build.gradle.kts Show resolved Hide resolved

stefanosiano marked this pull request as ready for review July 11, 2022 14:02

stefanosiano requested review from adinauer and romtsn as code owners July 11, 2022 14:02

philipphofmann approved these changes Jul 12, 2022

View reviewed changes

...n-tests/sentry-uitest-android/src/androidTest/java/io/sentry/uitest/android/EnvelopeTests.kt Outdated Show resolved Hide resolved

...ests/sentry-uitest-android/src/main/java/io/sentry/uitest/android/ProfilingSampleActivity.kt Show resolved Hide resolved

stefanosiano and others added 2 commits July 13, 2022 11:21

removed useless sleep at the end of a test

2c50289

added a flag to stop fibonacci on activity pause

Merge branch 'main' into tests/android-benchmark-percentiles

91ed3c0

romtsn reviewed Jul 13, 2022

View reviewed changes

stefanosiano merged commit f160e0d into main Jul 13, 2022

stefanosiano deleted the tests/android-benchmark-percentiles branch July 13, 2022 11:38

romtsn reviewed Jul 13, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

moved average benchmark results to p90 results #2152

moved average benchmark results to p90 results #2152

stefanosiano commented Jul 4, 2022 •

edited

philipphofmann left a comment

stefanosiano commented Jul 5, 2022

codecov-commenter commented Jul 8, 2022

stefanosiano commented Jul 11, 2022

philipphofmann commented Jul 12, 2022

philipphofmann left a comment

stefanosiano commented Jul 12, 2022

romtsn Jul 13, 2022

stefanosiano Jul 13, 2022

romtsn Jul 13, 2022

stefanosiano Jul 13, 2022

moved average benchmark results to p90 results #2152

moved average benchmark results to p90 results #2152

Conversation

stefanosiano commented Jul 4, 2022 • edited

📜 Description

💡 Motivation and Context

💚 How did you test it?

📝 Checklist

🔮 Next steps

philipphofmann left a comment

Choose a reason for hiding this comment

stefanosiano commented Jul 5, 2022

codecov-commenter commented Jul 8, 2022

Codecov Report

stefanosiano commented Jul 11, 2022

philipphofmann commented Jul 12, 2022

philipphofmann left a comment

Choose a reason for hiding this comment

stefanosiano commented Jul 12, 2022

romtsn Jul 13, 2022

Choose a reason for hiding this comment

stefanosiano Jul 13, 2022

Choose a reason for hiding this comment

romtsn Jul 13, 2022

Choose a reason for hiding this comment

stefanosiano Jul 13, 2022

Choose a reason for hiding this comment

stefanosiano commented Jul 4, 2022 •

edited