Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

moved average benchmark results to p90 results #2152

Merged
merged 4 commits into from
Jul 13, 2022

Conversation

stefanosiano
Copy link
Member

@stefanosiano stefanosiano commented Jul 4, 2022

馃摐 Description

  1. Moved benchmark results from average to p90 results
  2. Increased the measured iterations from 15 to 20 by default
  3. Profiling benchmark now prints raw values to the console, to later read them from the log file
  4. Removed testOrchestrator option from saucelabs config and gradle
  5. Benchmarks in SauceLabs will now run on 2 devices with Android 12, 3 with Android 11, 2 with Android 10
  6. We will just assert on the cpu overhead, not on duration or dropped frames anymore, even if we are printing them
  7. Added a test to send profiles to a Sentry project (dogfooding test)

馃挕 Motivation and Context

  1. To get better results from the benchmark, the p90 is a better metric than the average.
  2. We decided to move to 20 measured iteration to get reliable results.
  3. Since we want benchmark numbers to show on docs, we are going to print them to the log file. This is a quick solution to allow us to download the log, parse it, and get all the raw numbers we want to feed our systems. Later on, we will stop printing these values, and will send them to an endpoint.
  4. I realized the tests were not working on SauceLabs (even if they were suceeding) due to an issue with androidx test orchestrator. Removing it fixed the problem.
  5. The devices on SauceLbas to run the benchmarks on have been updated. Now we test devices for Android 10, 11 and 12. Doing so, we cover more than 70% of currently used devices, based on https://gs.statcounter.com/os-version-market-share/android/mobile-tablet/worldwide.
  6. Duration changes can get pretty unpredictable, due to system things, like thermal throttling, power saving, etc. Cpu overhead is, instead, pretty consistent across all devices, and that's the most important thing at the moment.
  7. The backend needs some instrumented profiles

#skip-changelog

馃挌 How did you test it?

I ran the saucelabs commands manually, and the tests ran perfectly fine. 馃コ

馃摑 Checklist

  • I reviewed the submitted code
  • I added tests to verify the changes
  • I updated the docs if needed
  • No breaking changes

馃敭 Next steps

We will send the benchmark data to a specific endpoint, but this will be done in another pr later.
I just need to check the log files and if we can get all the values just fine.

increased the measured iterations from 15 to 20 by default
removed testOrchestrator option from saucelabs config and gradle
profiling benchmark now prints raw values to the console, to later read them from the log file
Benchmarks in SauceLabs will now run on 2 devices with Andorid 12, 3 with Android 11, 2 with Android 10
Copy link
Member

@philipphofmann philipphofmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gave this a quick pass. What's missing to finish this PR, @stefanosiano?

@stefanosiano
Copy link
Member Author

Gave this a quick pass. What's missing to finish this PR, @stefanosiano?

Just a few changes on how to calculate the p90 values, as benchmarks are failing because of it 馃槄
And a change for refresh rate to avoid crashes on Android 11/12 that i didn't realize before 馃槄

updated the percentile calculation method
warmup iterations reduced from 3 to 2
the measured iterations now run in alternated order
added proguard file to fix ui-tests
changed low-end Android 10 device to a less powerful device
sdk init duration increase threshold increased to 250 milliseconds
cpu overhead range for the same operation increased to -2%..2%
even if we print all values, we just assert on the cpu overhead
added a test to send profiles to a Sentry project (dogfooding test)
@codecov-commenter
Copy link

Codecov Report

Merging #2152 (70ed5ba) into main (fda3319) will decrease coverage by 0.01%.
The diff coverage is n/a.

@@             Coverage Diff              @@
##               main    #2152      +/-   ##
============================================
- Coverage     80.95%   80.94%   -0.02%     
- Complexity     3257     3290      +33     
============================================
  Files           231      233       +2     
  Lines         11964    12044      +80     
  Branches       1589     1594       +5     
============================================
+ Hits           9686     9749      +63     
- Misses         1698     1712      +14     
- Partials        580      583       +3     
Impacted Files Coverage 螖
...ry/src/main/java/io/sentry/TransactionContext.java 85.71% <0.00%> (-14.29%) 猬囷笍
...entry/src/main/java/io/sentry/NoOpTransaction.java 25.00% <0.00%> (-0.81%) 猬囷笍
sentry/src/main/java/io/sentry/OutboxSender.java 65.64% <0.00%> (-0.74%) 猬囷笍
sentry/src/main/java/io/sentry/SpanContext.java 83.94% <0.00%> (-0.68%) 猬囷笍
...ain/java/io/sentry/protocol/SentryTransaction.java 88.97% <0.00%> (-0.28%) 猬囷笍
sentry/src/main/java/io/sentry/TraceContext.java 86.74% <0.00%> (-0.01%) 猬囷笍
sentry/src/main/java/io/sentry/Span.java 100.00% <0.00%> (酶)
sentry/src/main/java/io/sentry/SentryOptions.java 82.03% <0.00%> (酶)
sentry/src/main/java/io/sentry/TracesSampler.java 100.00% <0.00%> (酶)
...y/src/main/java/io/sentry/util/SampleRateUtil.java 88.88% <0.00%> (酶)
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
螖 = absolute <relative> (impact), 酶 = not affected, ? = missing data
Powered by Codecov. Last update fda3319...70ed5ba. Read the comment docs.

@stefanosiano stefanosiano marked this pull request as ready for review July 11, 2022 14:02
@stefanosiano
Copy link
Member Author

The only thing is that benchmarks take around 1 hour to run.
Are we fine with that?

@philipphofmann
Copy link
Member

The only thing is that benchmarks take around 1 hour to run. Are we fine with that?

Faster would be better, of course. What's the bottleneck? How could we reduce this?

Copy link
Member

@philipphofmann philipphofmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but ideally, we reduce the 1h duration.

@stefanosiano
Copy link
Member Author

LGTM, but ideally, we reduce the 1h duration.

There is this pr, with a slack thread on that and some consideration on this notion doc

useTestOrchestrator: true
- id: OnePlus_9_Pro_real_us # OnePlus 9 Pro - api 30 (11) - high end
- id: Google_Pixel_4_real_us # Google Pixel 4 - api 30 (11) - mid end
- id: Google_Pixel_2_real_us # Google Pixel 2 - api 30 (11) - low end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if Pixel 2 is really a low-end device, but that's alright I guess

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's one of the lowest-end provided by saucelabs for android 11. The other at ~same level should be Samsung Galaxy A50, but the pixel 2 is a 5 years old device, so it's fine for benchmarks

@stefanosiano stefanosiano merged commit f160e0d into main Jul 13, 2022
@stefanosiano stefanosiano deleted the tests/android-benchmark-percentiles branch July 13, 2022 11:38
fun printAllRuns(prefix: String) {
repeat(iterations) { index ->

println("$prefix ==================== Iteration $index ====================")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for me to understand - this is gonna be printed into Logcat and later you pull this through adb probably, and parse, correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's printed to console, which then it's written in the device.log from saucelabs. Then I will download the log file and parse it locally.
Later all of this will be automated when an endpoint to send data to will be available

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants