Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expanding Built-In Performance Metrics for Browser JavaScript #3

Merged
merged 7 commits into from
Sep 21, 2022
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
57 changes: 57 additions & 0 deletions text/0003-browser-js-built-in-metrics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
* Start Date: 2022-08-18
* RFC Type: feature
* RFC PR: https://github.com/getsentry/rfcs/pull/3
* RFC Status: active

# Expanding Built-In Performance Metrics for Browser JavaScript.

# Summary

This RFC details expanding list of built-in performance metrics for the browser JavaScript SDK, with additional data the Browser SDK captures. It propose adding two metrics that are already captured by the SDK: `connection.rtt` and `connection.downlink`, and three brand new metrics that are not yet captured, `device.memory`, `hardware.concurrency`, and `long_task.count`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every new metric we extract increases the volume we need to ingest on the metrics pipeline and increases the storage size very quickly as suddenly all sdks start sending them for every transactions as soon as they are upgraded.

Is there a way we can phase such a rollout ?
Right now we extract an average of 10 metrics per transaction. If we add 5 that represents 50%. Considering that javascript is one of the major sdks we have this will be a substantial increase.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So 2 of these connection.rtt, and connection.downlink are already being sent by the SDK - but they are considered custom metrics instead of built-in metrics.

I agree though that we should consider phasing such a rollout, great suggestion. Ok so let's do this.

  1. Convert connection.rtt and connection.downlink into built-in measurements. The value here is obv, and we are already collecting this data -> monitor the effect for 1 week
  2. Add hardware.concurrency -> monitor the effect for 1 week
  3. Add device.memory -> monitor the effect for 1 week

Based on some discussion with performance team and @k-fish we are dropping long_task.count from the proposal, it's not that valuable as we already collect the spans.

If we consider the effect not worth it, we can drop device.memory, but I feel we should not drop hardware.concurrency.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok,
also, something important to keep in mind.
If you add a standard tag to metrics or change the name of one (basically anything that is not custom), please register it on this file and statically assign it an ID.
https://github.com/getsentry/sentry/blob/master/src/sentry/sentry_metrics/indexer/strings.py.

So the indexer will not assign a new ID to that tag key for every single organization and they all will share the same ID


Note: References to `performance metrics` (external name) are equivalent to `measurements` (internal name) for the purpose of this document.

# Background

The Sentry product now supports the ability to set [Performance Metrics in the product](https://docs.sentry.io/product/sentry-basics/metrics), via attaching numeric tags to transaction data. Internally, we refer to these numeric tags as `measurements`. Some of these performance metrics are considered "built-in", and are automatically sent from certain SDKs. These built-in performance metrics are defined in an [explicit allowlist in Sentry's Relay config](https://github.com/getsentry/sentry/blob/dddb995d6f33527cc5fd2b6c6d484b29bb02253d/src/sentry/relay/config/__init__.py#L407-L428), and are [defined in Relay themselves as well](https://github.com/getsentry/relay/blob/4f3e224d5eeea8922fe42163552e8f20db674e86/relay-server/src/metrics_extraction/transactions.rs#L270-L276).

The Browser JavaScript SDKs currently has [seven built-in performance metrics](https://docs.sentry.io/platforms/javascript/performance/instrumentation/performance-metrics/), `fp`, `fcp`, `lcp`, `fid`, `cls`, `ttfb`, and `ttfb.requesttime`. In addition to built-in performance metrics, the product supports sending arbitrary custom performance metrics on transactions. For example in JavaScript:

```ts
const transaction = Sentry.getCurrentHub().getScope().getTransaction();

// Record amount of times localStorage was read
transaction.setMeasurement('localStorageRead', 4);
```

In the product, we display the built-in performance metrics and custom performance metrics in different sections of the event details. In addition, transactions have a [limit of 5 custom performance metrics that they can send](https://github.com/getsentry/sentry/blob/dddb995d6f33527cc5fd2b6c6d484b29bb02253d/src/sentry/relay/config/__init__.py#L430-L431).

# Proposals

## Existing Data

Aside from the seven built-in performance metrics the JavaScript SDKs set, the JavaScript SDK also sets [`connection.rtt` and `connection.downlink` as performance metrics](https://github.com/getsentry/sentry-javascript/blob/74db5275d8d5a28cfb18c5723575ea04c5ed5f02/packages/tracing/src/browser/metrics/index.ts#L396-L402). Since these are not in the built-in allow list in Relay/Sentry, **they are considered custom performance metrics, and take away from the custom performance metric quota that exists on transactions**.

`connection.rtt` is the [the estimated effective round-trip time of the current connection, rounded to the nearest multiple of 25 milliseconds](https://developer.mozilla.org/en-US/docs/Web/API/NetworkInformation/rtt). `connection.downlink` is the [effective bandwidth estimate in megabits per second, rounded to the nearest multiple of 25 kilobits per seconds](https://developer.mozilla.org/en-US/docs/Web/API/NetworkInformation/downlink). These were originally added to the SDK [Oct 2020](https://github.com/getsentry/sentry-javascript/pull/2966) to help grab info about network connectivity information.

Either we choose to promote these two values to built-in performance metrics, or we remove them entirely if we feel like they are not high value.

## New Data

In the same PR that added `connection.rtt` and `connection.downlink`, we also added support for grabbing [`deviceMemory`](https://developer.mozilla.org/en-US/docs/Web/API/Navigator/deviceMemory) and [`hardwareConcurrency`](https://developer.mozilla.org/en-US/docs/Web/API/Navigator/hardwareConcurrency). Currently these are set as [tags on the transaction](https://github.com/getsentry/sentry-javascript/blob/74db5275d8d5a28cfb18c5723575ea04c5ed5f02/packages/tracing/src/browser/metrics/index.ts#L405-L411). This should become performance metrics, as SDKs should not be settings tags like this one events, and there is value in seeing these numeric values on transactions. Here we would also rename `deviceMemory` -> `device.memory` and `hardwareConcurrency` -> `hardware.concurrency`.

Another additional option is to move all [browser Navigator](https://developer.mozilla.org/en-US/docs/Web/API/Navigator) related fields into a brand new SDK context - that is avaliable to all events, not just performance ones.

[Long Tasks](https://developer.mozilla.org/en-US/docs/Web/API/Long_Tasks_API) are JavaScript tasks that take 50ms or longer to execute. They are considered problematic because JavaScript is single threaded, so blocking the main thread is a big performance hit. In the SDK, [we track individual long task occurences as spans](https://github.com/getsentry/sentry-javascript/blob/74db5275d8d5a28cfb18c5723575ea04c5ed5f02/packages/tracing/src/browser/metrics/index.ts#L54-L59) and record them onto transactions.

In Sentry, we've been recording [`longTaskCount` on transactions](https://github.com/getsentry/sentry/blob/20780a5bdd988daa44825ce3c295452c280a9add/static/app/utils/performanceForSentry.tsx#L125) as a Custom Performance Metric for the Sentry frontend. So far, tracking the `longTaskCount` has been valuable as it allows us at a high level to see the most problematic transactions when looking at CPU usage. Since we already record long task spans in the SDK, it should be fairly easy to generate the count as a measurement, and promote into a built-in measurement. Here we would use `long_task.count` as the measurement name instead of `longTaskCount` that we used for internal testing.

## Decisions

Below Records each proposed built-in measurement, and the decision that was taken around them:

- [ ] `connection.rtt`
- [ ] `connection.downlink`
- [ ] `device.memory`
- [ ] `hardware.concurrency`
- [ ] `long_task.count`