New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: track aborted status, do not stringify for tracing #2249
Conversation
to give more perspective - grpc-js unary typically benches around 23-25k rps for unary calls (I know it's not the best use of it and streams are much more efficient) with 1.7ms avg latency and wildly ranging min/max latency between 0.4ms to 25ms(! loop being blocked by GC and other native side code), but when compared to fastify being able to handle around 100k rps on my machine with avg latency of 0 and max latency of 0.1ms there is a lot of room for improvements |
In the latest commit there are several pieces that has been refactors:
with everything in here I managed to get this to about 43k RPS with unary requests reliably
Associated flame graph: There are 2 areas of concern:
Not sure what exactly could be done here except for trying out micro optimizations |
One of the tests failed due to |
in terms of |
packages/grpc-js/src/server-call.ts
Outdated
stream.on('error', onEnd); | ||
|
||
function onData(chunk: Buffer) { | ||
receivedLength += chunk.length; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This changes byteLength
to length
. I'm not sure exactly what the difference is, but I think this may be causing the out of bounds access error in one of the tests.
@murgatroid99 updated based on the suggestions, lets see how the tests do! |
seems like its crashing at |
Seems like tests are generally ok, in terms of test targets - kind of odd that its running node 10/12 as they are both EOL. Any plans to update that? |
Those test results look pretty good. The Windows failure is old and unrelated to this change. #2246 is an open PR to test on more recent versions. Also, it's only Windows that currently doesn't test on any version newer than 12. If you would like this to be published as a patch release on 1.7.x, you'll have to rebase it onto the 1.7.x branch and then you or I can change the branch target. Otherwise it looks good to go. |
I have one more experiment with https://www.npmjs.com/package/reusify - its used in underlaying modules for fastify and should yield significant improvements if I can wrap Metadata in a way that doesn't break things. It's kind of tricky to get this done as I don't always know when it's safe to release an object. Maybe would be a good idea to have non-wrapped implementation for client calls and wrapper implementation for server side as we can generally release once the request has ended I'll work on that tomorrow and will rebase to 1.7.x in either case |
It is never safe to release objects that have been passed to user application code. We never know if they have stored it for later reference, and there is no API for them to release it. |
I agree, but ultimately the only "quick" way to speed up further is to significantly reduce GC and/or speed up conversion functions. Specifically This could be an opt-in on the part of the user via some sort of flag, idea here is that Metadata lifecycle is tied to request RPC lifecycle - whenever http streams are fully closed Metadata objects will get released. |
There is quite a bit of potential complexity here (How should we implement this? Should it only cover |
eh gotta rebase now ❌ |
I don't think you want to keep that merge commit. It's probably not going to make the rebase easier. |
1. no unused timers, wrap tracing calls to avoid stringifying 2. track graceful end of the call and avoid emitting 'cancelled' in such cases 3. remove validate calls in metadata on operations where it's not needed 4. refactor server session stream handlers into separate channelz enabled/disabled handlers 5. refactor message request logic - reduce amount of microtasks generated 6. improve sendStatus a little when there is no metadata involved
eh, had to squash it all up. there is a new part around |
request: RequestType; | ||
end: (metadata?: Metadata) => void; | ||
}; | ||
export type ServerReadableStream<RequestType, ResponseType> = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this got reformatted, I think its prettier from the project that forced the styling
packages/grpc-js/src/server-call.ts
Outdated
} | ||
|
||
// to compensate for a fact that cancelled is not always called | ||
this.emit('close'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this change. I am not aware of any code that is listening for this close
event. In addition, in what situation would an application care about this event if it has already sent a status.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is sort of similar to 'cancelled', but it allows you to catch all cases and do unconditional cleanup via stream.finished
, for instance
before that I had to listen to 'cancelled' event and manually call call.destroy() - without that 'close' wouldn't trigger
might have gone too far with the change though? happy to revert that as I'm not 100% sure thats how its supposed to be
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted this and if needed we can do a follow up on this - so it's just a separate unrelated issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're handling a stream of messages from the remote end, you should be listening for the end
event to know when you have received all messages. And conversely, if you are sending a stream of messages, you should call the end
event to indicate that you have finished writing. You shouldn't generally need to listen for the cancelled
event unless you are doing a long asynchronous operation. I don't know why you would need to delete the call
at all.
Its a long async operation. Clients can disconnect at any moment due to network change, app being closed, etc.
And once client disconnects stream needs to be cleaned up. There is never an ‘end’ event from either side, always abrupt disconnect that needs to be cleaned up
…On Oct 19, 2022, 15:33 -0700, Michael Lumish ***@***.***>, wrote:
@murgatroid99 commented on this pull request.
In packages/grpc-js/src/server-call.ts:
> - this.emit('streamEnd', false);
- this.sendStatus({code: Status.CANCELLED, details: 'Cancelled by client', metadata: new Metadata()});
+
+ if (!this.statusSent) {
+ this.cancelled = true;
+ this.emit('cancelled', 'cancelled');
+ this.emit('streamEnd', false);
+ this.sendStatus({
+ code: Status.CANCELLED,
+ details: 'Cancelled by client',
+ metadata: null,
+ });
+ }
+
+ // to compensate for a fact that cancelled is not always called
+ this.emit('close');
If you're handling a stream of messages from the remote end, you should be listening for the end event to know when you have received all messages. And conversely, if you are sending a stream of messages, you should call the end event to indicate that you have finished writing. You shouldn't generally need to listen for the cancelled event unless you are doing a long asynchronous operation. I don't know why you would need to delete the call at all.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
In that case you should still always see the Anyway, it looks like the PR is in a good state now, and the tests are passing. Is there anything else you would like to do before we merge it? |
LGTM in this current state ;) if you could add |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the contribution. I am happy to see some performance improvements in this library.
Regarding Hacktoberfest, repositories opt in to that event this year, and this repository was not opted in until just now, so you almost didn't get credit for this PR. According to the website, you don't actually need that label. It is also sufficient for the PR to be approved or merged.
Can you share the profiling setup you used? |
@murgatroid99 pretty much everything is in there, you are likely going to be the most interested in this: https://github.com/AVVS/talks/blob/main/nodeconfeu2022/scaling-nodejs/src/grpc-raw.ts for serialization I used ts-proto + buf to generate (commited to repo), for flamegraphs I'm using clinic.js - there are a bunch of scripts you can meddle with (ie enable / disable kernel tracing) for load producer I've used ghz, but I think you can also use something like nghttp2, but its a little bit more involved |
I've been working on improving perf of grpc-js and a few things stood out to me:
Http2ServerCallStream
has an empty timer created, which results in overhead, I assume this was done in an effort to have the same backing map, but performance is gained when not cleaning / creating timeouts needlesslyaborted
event and in case stream isaborted
emitcancelled
, otherwise graceful stream end shouldn't cause what ultimately is exception. This allows for improved performance around cancellation logicLocally I've had all the tests pass on node 18, fingers crossed they pass everywhere else. I'm still mostly concerned about (2) as I don't yet fully grasp the logic around this call (checked blame and it was like that for 3 years)
Anyhow, hopefully these changes are useful or could be a starting base for further improvements