Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: track aborted status, do not stringify for tracing #2249

Merged
merged 2 commits into from Oct 20, 2022

Conversation

AVVS
Copy link
Contributor

@AVVS AVVS commented Oct 17, 2022

I've been working on improving perf of grpc-js and a few things stood out to me:

  1. Each Http2ServerCallStream has an empty timer created, which results in overhead, I assume this was done in an effort to have the same backing map, but performance is gained when not cleaning / creating timeouts needlessly
  2. Track aborted event and in case stream is aborted emit cancelled, otherwise graceful stream end shouldn't cause what ultimately is exception. This allows for improved performance around cancellation logic
  3. Wrap call to tracing as JSON.stringify + toJSON on metadata is an expensive operation and is performed regardless of tracing actually be on

Locally I've had all the tests pass on node 18, fingers crossed they pass everywhere else. I'm still mostly concerned about (2) as I don't yet fully grasp the logic around this call (checked blame and it was like that for 3 years)

Anyhow, hopefully these changes are useful or could be a starting base for further improvements

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Oct 17, 2022

CLA Signed

The committers listed above are authorized under a signed CLA.

packages/grpc-js/src/server-call.ts Outdated Show resolved Hide resolved
packages/grpc-js/src/server-call.ts Outdated Show resolved Hide resolved
packages/grpc-js/src/server-call.ts Outdated Show resolved Hide resolved
@AVVS
Copy link
Contributor Author

AVVS commented Oct 17, 2022

to give more perspective - grpc-js unary typically benches around 23-25k rps for unary calls (I know it's not the best use of it and streams are much more efficient) with 1.7ms avg latency and wildly ranging min/max latency between 0.4ms to 25ms(! loop being blocked by GC and other native side code), but when compared to fastify being able to handle around 100k rps on my machine with avg latency of 0 and max latency of 0.1ms there is a lot of room for improvements

@AVVS
Copy link
Contributor Author

AVVS commented Oct 18, 2022

In the latest commit there are several pieces that has been refactors:

  1. channelz - select associated session stream handler on startup, this removes a lot of checks we either know to be true or false ahead of time. some code is duplicated, but not too much
  2. commented out validate checks on get / remove - respective keys would not be there in the first place, no reason to validate as it's also in a hot path with plenty of .remove calls
  3. receiveUnaryMessage refactored to have fewer microtasks queued and in case compression isnt used completely avoid it. this is a little messy as we introduce a few try/catch blocks and work on a promise directly in case of decompression, but errors are handled correctly and overall its much more performant

with everything in here I managed to get this to about 43k RPS with unary requests reliably

ghz --insecure --proto ./src/hello.proto --call helloworld.Greeter.SayHello -d '{"who":"nodeconf"}' -n 3000000 --connections 2 -c 400 --count-errors --timeout 0 0.0.0.0:8080 --cpus=8 --skipFirst=10000

Summary:
  Count:	2990000
  Total:	68.92 s
  Slowest:	37.58 ms
  Fastest:	0.09 ms
  Average:	7.02 ms
  Requests/sec:	43383.58

Response time histogram:
  0.094  [1]      |
  3.842  [135888] |∎∎∎∎∎∎∎∎∎∎∎∎∎
  7.590  [427175] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  11.338 [375947] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  15.087 [49140]  |∎∎∎∎∎
  18.835 [10126]  |∎
  22.583 [789]    |
  26.331 [526]    |
  30.079 [11]     |
  33.827 [183]    |
  37.575 [214]    |

Latency distribution:
  10 % in 3.38 ms
  25 % in 5.13 ms
  50 % in 7.17 ms
  75 % in 8.89 ms
  90 % in 10.45 ms
  95 % in 11.77 ms
  99 % in 15.43 ms

Status code distribution:
  [OK]   2990000 responses

Associated flame graph:

Screen Shot 2022-10-18 at 13 30 45

There are 2 areas of concern:

  • Node.js stream.respond is relatively slow
  • Lots of transforms between http2 headers <-> grpc.Metadata

Not sure what exactly could be done here except for trying out micro optimizations

@AVVS
Copy link
Contributor Author

AVVS commented Oct 18, 2022

One of the tests failed due to node: prefix, reverted this

@AVVS
Copy link
Contributor Author

AVVS commented Oct 18, 2022

in terms of switch vs if I believe if is faster up to 10 clauses based on what I've heard, but that shouldn't matter much here, still used that

packages/grpc-js/src/metadata.ts Outdated Show resolved Hide resolved
packages/grpc-js/src/server.ts Outdated Show resolved Hide resolved
packages/grpc-js/src/server.ts Outdated Show resolved Hide resolved
stream.on('error', onEnd);

function onData(chunk: Buffer) {
receivedLength += chunk.length;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes byteLength to length. I'm not sure exactly what the difference is, but I think this may be causing the out of bounds access error in one of the tests.

@AVVS
Copy link
Contributor Author

AVVS commented Oct 18, 2022

@murgatroid99 updated based on the suggestions, lets see how the tests do!

@AVVS
Copy link
Contributor Author

AVVS commented Oct 18, 2022

seems like its crashing at const compressed = requestBytes.readUInt8(0) === 1; meaning message is empty, will see what changed and why this isnt handled anymore

@AVVS
Copy link
Contributor Author

AVVS commented Oct 18, 2022

Seems like tests are generally ok, in terms of test targets - kind of odd that its running node 10/12 as they are both EOL. Any plans to update that?

@murgatroid99
Copy link
Member

Those test results look pretty good. The Windows failure is old and unrelated to this change. #2246 is an open PR to test on more recent versions. Also, it's only Windows that currently doesn't test on any version newer than 12.

If you would like this to be published as a patch release on 1.7.x, you'll have to rebase it onto the 1.7.x branch and then you or I can change the branch target. Otherwise it looks good to go.

@AVVS
Copy link
Contributor Author

AVVS commented Oct 18, 2022

I have one more experiment with https://www.npmjs.com/package/reusify - its used in underlaying modules for fastify and should yield significant improvements if I can wrap Metadata in a way that doesn't break things. It's kind of tricky to get this done as I don't always know when it's safe to release an object. Maybe would be a good idea to have non-wrapped implementation for client calls and wrapper implementation for server side as we can generally release once the request has ended

I'll work on that tomorrow and will rebase to 1.7.x in either case

@AVVS
Copy link
Contributor Author

AVVS commented Oct 18, 2022

Screen Shot 2022-10-18 at 16 51 04

if you look at the bottom right - this is with kernel tracing on - you can see how much time GC is taking

@murgatroid99
Copy link
Member

It is never safe to release objects that have been passed to user application code. We never know if they have stored it for later reference, and there is no API for them to release it.

@AVVS
Copy link
Contributor Author

AVVS commented Oct 19, 2022

I agree, but ultimately the only "quick" way to speed up further is to significantly reduce GC and/or speed up conversion functions. Specifically Metadata class is the one creating the most pressure and pooling it would eliminate a lot of GC, each unary call I can see at least 3 objects allocated

This could be an opt-in on the part of the user via some sort of flag, idea here is that Metadata lifecycle is tied to request RPC lifecycle - whenever http streams are fully closed Metadata objects will get released.

@murgatroid99
Copy link
Member

There is quite a bit of potential complexity here (How should we implement this? Should it only cover Metadata? Should we also consider the client? etc.), so we should definitely address that in a different PR. It would also be a good idea to have a gRFC proposal to flesh out the details, because it seems like this will require a new API somewhere.

@AVVS AVVS changed the base branch from master to @grpc/grpc-js@1.7.x October 19, 2022 20:54
@AVVS
Copy link
Contributor Author

AVVS commented Oct 19, 2022

eh gotta rebase now ❌

@murgatroid99
Copy link
Member

I don't think you want to keep that merge commit. It's probably not going to make the rebase easier.

1. no unused timers, wrap tracing calls to avoid stringifying
2. track graceful end of the call and avoid emitting 'cancelled' in such cases
3. remove validate calls in metadata on operations where it's not needed
4. refactor server session stream handlers into separate channelz enabled/disabled handlers
5. refactor message request logic - reduce amount of microtasks generated
6. improve sendStatus a little when there is no metadata involved
@AVVS
Copy link
Contributor Author

AVVS commented Oct 19, 2022

eh, had to squash it all up. there is a new part around .emit('close') + .sendStatus accepting null as Metadata. This is to avoid creating these objects as there were quite a few places where they aren't needed.

request: RequestType;
end: (metadata?: Metadata) => void;
};
export type ServerReadableStream<RequestType, ResponseType> =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this got reformatted, I think its prettier from the project that forced the styling

}

// to compensate for a fact that cancelled is not always called
this.emit('close');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this change. I am not aware of any code that is listening for this close event. In addition, in what situation would an application care about this event if it has already sent a status.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is sort of similar to 'cancelled', but it allows you to catch all cases and do unconditional cleanup via stream.finished, for instance

before that I had to listen to 'cancelled' event and manually call call.destroy() - without that 'close' wouldn't trigger

might have gone too far with the change though? happy to revert that as I'm not 100% sure thats how its supposed to be

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screen Shot 2022-10-19 at 15 21 00

Screen Shot 2022-10-19 at 15 21 28

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted this and if needed we can do a follow up on this - so it's just a separate unrelated issue

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're handling a stream of messages from the remote end, you should be listening for the end event to know when you have received all messages. And conversely, if you are sending a stream of messages, you should call the end event to indicate that you have finished writing. You shouldn't generally need to listen for the cancelled event unless you are doing a long asynchronous operation. I don't know why you would need to delete the call at all.

@AVVS
Copy link
Contributor Author

AVVS commented Oct 19, 2022 via email

@murgatroid99
Copy link
Member

In that case you should still always see the cancelled event after your change.

Anyway, it looks like the PR is in a good state now, and the tests are passing. Is there anything else you would like to do before we merge it?

@AVVS
Copy link
Contributor Author

AVVS commented Oct 20, 2022

LGTM in this current state ;) if you could add hacktoberfest-accepted label after that - would be grand, too. t-shirts <3

Copy link
Member

@murgatroid99 murgatroid99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution. I am happy to see some performance improvements in this library.

Regarding Hacktoberfest, repositories opt in to that event this year, and this repository was not opted in until just now, so you almost didn't get credit for this PR. According to the website, you don't actually need that label. It is also sufficient for the PR to be approved or merged.

@murgatroid99 murgatroid99 merged commit b364135 into grpc:@grpc/grpc-js@1.7.x Oct 20, 2022
@murgatroid99
Copy link
Member

Can you share the profiling setup you used?

@AVVS
Copy link
Contributor Author

AVVS commented Oct 27, 2022

@murgatroid99 pretty much everything is in there, you are likely going to be the most interested in this:

https://github.com/AVVS/talks/blob/main/nodeconfeu2022/scaling-nodejs/src/grpc-raw.ts

for serialization I used ts-proto + buf to generate (commited to repo), for flamegraphs I'm using clinic.js - there are a bunch of scripts you can meddle with (ie enable / disable kernel tracing)

for load producer I've used ghz, but I think you can also use something like nghttp2, but its a little bit more involved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants