Skip to content

Commit

Permalink
apollo-server-core: unified Studio reporting (#4142)
Browse files Browse the repository at this point in the history
The usage reporting plugin in `apollo-server-core` is not the first tool Apollo
built to report usage to Studio. Previous iterations such as `optics-agent` and
`engineproxy` reported a combination of detailed per-field single-operation
performance *traces* and summarized *stats* of operations to Apollo's
servers. When we built this TypeScript usage reporting plugin in 2018, for the
sakes of expediency we did something different: it only sent traces to Apollo's
servers. This meant that the performance of every single single user operation
was described in detail to Apollo's servers. Studio is not an exhaustive trace
warehouse: we have always *sampled* the traces received, making only some of
them available via Studio's Traces UI. The other traces were converted to stats
inside Studio's servers.

While this meant that the reporting agent was simpler than the previous
implementations (no need to be able to describe performance statistics), it also
meant that the protocol used to talk to Studio consumed a lot more bandwidth (as
well as CPU time for encoding traces).

This PR returns us to the world where Studio usage is reported as a combination
of stats and traces. It takes a slightly different approach than the previous
implementations: instead of reporting stats and traces in parallel, usage
reports contain both stats and traces. Each GraphQL operation is described
either as a trace or as stats, not both.

We expect this to significantly reduce the network and CPU requirements of
sending usage reports to Studio. It should not significantly affect the
experience of using Studio: we have always heavily sampled traces in Studio
before saving them to the trace warehouse, and the default heuristic for which
operations to send as traces works similarly to the heuristic used in Studio's
servers.

This PR introduces an option `experimental_sendOperationAsTrace` to allow you to
control whether a given operation is sent as trace or stats. This is truly an
experimental option that may change at any time. For example, you should not
rely on the fact that this will be called on all operations after the operation
is done with a full, or on its signature, or even that it exists. It is likely
that future improvements to the usage reporting plugin will change how
operations are observed so that we don't have to collect a full trace before
deciding how to represent the operation.

Some other notes:

- Upgrade our fork `@apollo/protobufjs` with a few improvements:
  - New `js_use_toArray` option which lets you encode repeated fields from
    objects that aren't stored in memory as arrays but expose `toArray`
    methods. We use this so that we can build up `DurationHistogram`s and
    map-like objects in a non-array fashion and only convert to array at
    encoding time.
  - New `js_preEncoded` option which allows you to encode messages in repeated
    fields as buffers (Uint8Arrays). This helps amortize encoding cost of a
    large message over time instead of freezing the event loop to encode the
    whole message at once. This replaces an old hack we used for one field with
    something built in to the protobuf compiler (including correct TypeScript
    typings).
  - New `--no-from-object` flag which we use to reduce the size of generated
    code (as we don't use the fromObject protobuf.js API).
- In order to help us validate that the trace->stats code in this PR matches
  similar code in Studio's servers, the flag
  `internal_includeTracesContributingToStats` sends the traces that contribute
  to stats in a special field. This is something we only use as part of our own
  validation in our servers; for your graphs it will have no effect other than
  increasing message size.
- Viewing traces in Studio is only available on paid plans. The usage-reporting
  endpoint now tells the plugin whether traces are supported on your graph's
  plan; if not supported, the plugin will switch to sending all operations as
  stats (regardless of the value of `experimental_sendOperationAsTrace`) after
  the first report.
- We try to estimate the message size compared to maxUncompressedReportSize via
  a rough estimate about how big the leaf nodes of the stats messages will be
  rather than carefully counting how much space is used by each number and
  histogram. We do take the lengths of all strings into account.
- By mistake, this plugin never sent the cache policy on traces, meaning that
  visualizing cache-specific stats in Studio did not work. This is now fixed.

This project was begun by @jsegaran and completed by @glasser.
  • Loading branch information
jsegaran committed Apr 28, 2021
1 parent 78304ec commit 8ce26dd
Show file tree
Hide file tree
Showing 14 changed files with 7,355 additions and 138 deletions.
26 changes: 13 additions & 13 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions packages/apollo-reporting-protobuf/package.json
Expand Up @@ -7,7 +7,7 @@
"scripts": {
"clean": "git clean -fdX -- dist",
"prepare": "npm run clean && mkdir dist && npm run pbjs && npm run pbts && cp src/* dist",
"pbjs": "apollo-pbjs --target static-module --out dist/protobuf.js --wrap commonjs --force-number src/reports.proto",
"pbjs": "apollo-pbjs --target static-module --out dist/protobuf.js --wrap commonjs --force-number --no-from-object src/reports.proto",
"pbts": "apollo-pbts -o dist/protobuf.d.ts dist/protobuf.js",
"update-proto": "curl -sSfo src/reports.proto https://usage-reporting.api.apollographql.com/proto/reports.proto"
},
Expand All @@ -29,6 +29,6 @@
},
"homepage": "https://github.com/apollographql/apollo-server#readme",
"dependencies": {
"@apollo/protobufjs": "^1.0.3"
"@apollo/protobufjs": "1.2.0"
}
}
22 changes: 1 addition & 21 deletions packages/apollo-reporting-protobuf/src/index.js
Expand Up @@ -3,29 +3,9 @@ const protobufJS = require('@apollo/protobufjs/minimal');

// Remove Long support. Our uint64s tend to be small (less
// than 104 days).
// XXX Just remove this in our fork?
// https://github.com/protobufjs/protobuf.js/issues/1253
protobufJS.util.Long = undefined;
protobufJS.configure();

// Override the generated protobuf Traces.encode function so that it will look
// for Traces that are already encoded to Buffer as well as unencoded
// Traces. This amortizes the protobuf encoding time over each generated Trace
// instead of bunching it all up at once at sendReport time. In load tests, this
// change improved p99 end-to-end HTTP response times by a factor of 11 without
// a casually noticeable effect on p50 times. This also makes it easier for us
// to implement maxUncompressedReportSize as we know the encoded size of traces
// as we go.
const originalTracesAndStatsEncode = protobuf.TracesAndStats.encode;
protobuf.TracesAndStats.encode = function(message, originalWriter) {
const writer = originalTracesAndStatsEncode(message, originalWriter);
const encodedTraces = message.encodedTraces;
if (encodedTraces != null && encodedTraces.length) {
for (let i = 0; i < encodedTraces.length; ++i) {
writer.uint32(/* id 1, wireType 2 =*/ 10);
writer.bytes(encodedTraces[i]);
}
}
return writer;
};

module.exports = protobuf;
6 changes: 5 additions & 1 deletion packages/apollo-reporting-protobuf/src/reports.proto
Expand Up @@ -375,6 +375,10 @@ message ContextualizedStats {

// A sequence of traces and stats. An individual trace should either be counted as a stat or trace
message TracesAndStats {
repeated Trace trace = 1;
repeated Trace trace = 1 [(js_preEncoded)=true];
repeated ContextualizedStats stats_with_context = 2 [(js_use_toArray)=true];
// This field is used to validate that the algorithm used to construct `stats_with_context`
// matches similar algorithms in Apollo's servers. It is otherwise ignored and should not
// be included in reports.
repeated Trace internal_traces_contributing_to_stats = 3 [(js_preEncoded)=true];
}
2 changes: 1 addition & 1 deletion packages/apollo-server-core/src/plugin/traceTreeBuilder.ts
Expand Up @@ -261,7 +261,7 @@ function errorToProtobufError(error: GraphQLError): Trace.Error {
}

// Converts a JS Date into a Timestamp.
function dateToProtoTimestamp(date: Date): google.protobuf.Timestamp {
export function dateToProtoTimestamp(date: Date): google.protobuf.Timestamp {
const totalMillis = +date;
const millis = totalMillis % 1000;
return new google.protobuf.Timestamp({
Expand Down

0 comments on commit 8ce26dd

Please sign in to comment.