feat: ability to directly write the gRPC frame to the ServerWritableStream #2652

mishimastar · 2024-01-30T17:08:28Z

Hello!
For some high-load purposes, I suggest implementing the ability to directly write the protobuf message or a gRPC frame to the ServerWritableStream.

My usecase:

I have a service that stores data in memory, shares it with its clients, and keeps their cache consistent with updates notification like etcd watch .

Its feature is the thousands of clients that can often make a "watch" request (server streaming rpc) with a large first response and infrequent small responses after it.

Since data updates happen quite infrequently, it makes sense to cache gRPC response frames instead of recalculating them each time.

For now I have implemented this as a hack in my code and it has saved me a lot of CPU time and RAM, especially arrayBuffers. Response time has been reduced tenfold for large messages. (Along with serialization and memory allocations, I got rid of the long iterative process of collecting the requested data from my cache)

a bit of code prompted me to create this pull request

service SomeService {
    rpc Watch(WatchRequest) returns (stream WatchResponse);
}

I used this extension of available types and created a function to skip serialization at runtime.

type HackBuf = {
    write(chunk: WatchResponse): boolean;
    write(chunk: WatchResponse | Buffer, encoding: 'buffer'): boolean;
};
type HackHttp2Buf = { call?: { serializeMessage: (r: WatchResponse) => Buffer } };
export type HackedWritableBuf = ServerWritableStream<WatchRequest, WatchResponse> & HackBuf & HackHttp2Buf;

const createFakeSerializer = (ctx: HackHttp2Buf): ((r: WatchResponse | Buffer) => Buffer) => {
    const orig = ctx.call!.serializeMessage;
    const binded = orig.bind(ctx.call);
    return (r) => (r instanceof Buffer ? r : binded(r));
};

Using a serializer generated from protobufjs and the serializeMessage method on ServerWritableStream I got this monster:

const SerializeWatchResponse = (v: WatchResponse): Buffer => {
    const ww = new Writer();
    ww.uint32(0).uint32(0).uint32(0).uint32(0).uint32(0);
    const out = Buffer.from(WatchResponse.encode(v, ww).finish());
    out.writeUInt8(0, 0);
    out.writeUInt32BE(out.length - 5, 1);
    return out;
};

It has only one buffer allocation instead of two when the gRPC frame serializer grpc and the protobuf message serializer are separated.

So now we can change serializer and write buffer instead of JS object.

const watch = (call: HackedWritableBuf) => {
    // ... some code ...
    call.call!.serializeMessage = createFakeSerializer(call);
    const frame = SerializeWatchResponse(WatchResponceExample)
    call.write(frame, 'buffer')
    // ... some code ...
};

So, I tried to make it less hacky. I am not entirely sure whether it is worth allowing not only a gRPC frame, but also a protobuf message (or any other format implemented in a buffer) to be passed to the write method, since they are difficult to distinguish from each other - checking the first five bytes does not look like definition for sure.

Thanks!

linux-foundation-easycla · 2024-01-30T17:08:32Z

The committers listed above are authorized under a signed CLA.

✅ login: mishimastar / name: Sergey Saltykov (2725ea2)

…tream

murgatroid99 · 2024-01-30T21:55:05Z

I think the biggest problem here is going to be that it is already valid to define a service handler that sends Buffers, which would then be passed to the serializer and framed as usual. This change would create a pitfall causing buffers with a specific byte pattern to go through a different code path and send the wrong message.

I would prefer that we consider other approaches for optimizing. A CPU profile would be helpful, but I can see a few possible approaches:

If the main issue is the serialization, you could define a custom serializer that would pass through pre-serialized buffers.
If the main issue is the buffer copy, one possibility would be to write the gRPC message header and the message content to the HTTP2 stream separately. The HTTP2 stream class supports the _writev operation, so it should be able to consolidate those into a single DATA frame.
If the main issue is the cache access, then that's probably something you should fix in your own application before we start changing things in this library.
Otherwise, the problem would probably need more consideration.

mishimastar · 2024-02-01T19:53:50Z

I think the biggest problem here is going to be that it is already valid to define a service handler that sends Buffers, which would then be passed to the serializer and framed as usual. This change would create a pitfall causing buffers with a specific byte pattern to go through a different code path and send the wrong message.

As I understand grpc principles, developers are free to choose the type of payload and create a serializer for it themselves. The main thing is that the client and server know how to serialize and deserialize it. I understand that I am proposing not only to allow the developer to pass an arbitrary buffer as a payload, which still more or less satisfies the requirements of gRPC, but to allow the developer to independently generate and transmit a gRPC frame.

In order to abstract from the problems in my application, I wrote a small service and a client for it and published them. Based on these, I made some performance testing to confirm that my proposal makes sense.

I implemented 3 ways of writing:

vanilla way - writing js object
custom protobuf serializer - as you offered
custom gRPC frame serializer

Could you please look at the benchmark conditions and results described in the readme?

Thank you!

P.S. If you suddenly want to dive into the code:
There are three branches in the repository. In the first, a js object is passed to Writable, in the second, a Buffer of protobuf message, in the third a Buffer of gRPC frame. All the difference in code between these branches locates in the src/service.ts file

murgatroid99 · 2024-02-01T21:13:32Z

I'll take it as a given that you see a substantial performance improvement from caching framed messages vs unframed serialized messages. However, there is still more potential nuance here, because there are two operations in framing that are potentially costly: allocating the buffer to contain the full framed message, and copying the serialized buffer into the framed message buffer. I would prefer to try out optimizations to that process before going in the direction of API changes.

In addition, I don't think you quite understood my objection to this change. The freedom to choose the payload type is the cause of the problem, because Buffer is a valid payload type. With the logic you have here, those payloads will always bypass the serializer, and they can be interpreted as a framed message even if the application developer did not intend that.

mishimastar · 2024-02-02T12:19:43Z

allocating the buffer to contain the full framed message, and copying the serialized buffer into the framed message buffer. I would prefer to try out optimizations to that process before going in the direction of API changes.

I thought in this direction and seems to have found a suitable solution.
We should change interfaces of serializeMessage and write from

interface Http2ServerCallStream<RequestType, ResponseType> {
    ...
    serializeMessage(value: ResponseType | Buffer): Buffer
    write(chunk: Buffer): boolean | undefined
    ...
}

to:

interface Http2ServerCallStream<RequestType, ResponseType> {
    ...
    serializeMessage(value: ResponseType | Buffer): { infoBytes: Buffer; payload: Buffer }
    write(chunk: { infoBytes: Buffer; payload: Buffer }): boolean | undefined
    ...
}

So in that case we can implement serializeMessage still with 2 memory allocations, but total allocated memory will be N+5 bytes instead of 2N+5 (N is length of payload in bytes) and we avoid copiyng from one Buffer to another.
write should be updated too:

class Http2ServerCallStream {
    . . .
    serializeMessage(value: ResponseType) :  { infoBytes: Buffer; payload: Buffer }  {
        const payload = this.handler.serialize(value);
        const infoBytes = Buffer.allocUnsafe(5);
        infoBytes.writeUInt8(0, 0);
        infoBytes.writeUInt32BE(payload.byteLength, 1);
        return { infoBytes, payload };
    }
    
    . . .
    
    write(chunk: { infoBytes: Buffer; payload: Buffer } )  {
        if (this.checkCancelled()) {
          return;
        }

        const { infoBytes, payload } = chunk;
        if (
          this.maxSendMessageSize !== -1 &&
          (payload.length + infoBytes.length) > this.maxSendMessageSize
        ) {
          this.sendError({
            code: Status.RESOURCE_EXHAUSTED,
            details: `Sent message larger than max (${payload.length + infoBytes.length} vs. ${this.maxSendMessageSize})`,
          });
          return;
        }

        this.sendMetadata();
        this.emit('sendMessage');
        const d1 = this.stream.write(infoBytes);
        // ignoring highWaterMark and drain event because for library user there must be only one write
        const d2 = this.stream.write(payload);
        return d1 && d2;
    }
    . . .
}

I implemented this solution as hack in 4th branch and made performance testing. Combination of this solution and caching unframed serialized messages showed performance very close to the case with caching framed messages.

If you have no objection to this decision, I will rename this or create a new pull request with this solution, and the need
in discussing changes to the library API will disappear.

Thank you!

Message type	Cache type (link to handler)	Details	Small folder	Medium folder	Large folder
mono	protobuf-cache-2writes	link	~1% CPU for 0.5min ~200MB RAM for ...	~3% CPU for 1min ~200MB RAM for ...	~12% CPU for 0.25min ~4% CPU for 2min ~205MB RAM for ...
separated	protobuf-cache-2writes	link	~8% CPU for 0.5min ~240MB RAM for 0.25min	~12% CPU for 1min ~235MB RAM for 2.5min	~25% CPU for 0.5min ~10% CPU for 2min ~230MB RAM for ...

P.S.

With the logic you have here, those payloads will always bypass the serializer, and they can be interpreted as a framed message even if the application developer did not intend that.

Thanks for clarification, now that's clear for me

murgatroid99 · 2024-02-02T17:46:57Z

Yes, that is the kind of optimization I was talking about.

I want to point out that #2650 removed and replaced Http2ServerCallStream, so you will need to pull the current version and then make a similar change to the BaseServerInterceptingCall class. That might be easier with a new branch, and a new PR, but you can decide how you want to handle that.

mishimastar · 2024-02-02T20:03:38Z

make a similar change to the BaseServerInterceptingCall class

We can continue in #2658

mishimastar force-pushed the server-streaming-write-buffer branch 2 times, most recently from bef964a to f338988 Compare January 30, 2024 18:06

mishimastar closed this Jan 30, 2024

mishimastar force-pushed the server-streaming-write-buffer branch from f338988 to 666a374 Compare January 30, 2024 18:07

feat: ability to directly write the gRPC frame to the ServerWritableS…

2725ea2

…tream

mishimastar reopened this Jan 30, 2024

mishimastar mentioned this pull request Feb 2, 2024

grpc-js: Optimizing buffer allocation for server server and bidi streams #2658

Open

mishimastar closed this Feb 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: ability to directly write the gRPC frame to the ServerWritableStream #2652

feat: ability to directly write the gRPC frame to the ServerWritableStream #2652

mishimastar commented Jan 30, 2024

linux-foundation-easycla bot commented Jan 30, 2024 •

edited

murgatroid99 commented Jan 30, 2024

mishimastar commented Feb 1, 2024

murgatroid99 commented Feb 1, 2024

mishimastar commented Feb 2, 2024

murgatroid99 commented Feb 2, 2024

mishimastar commented Feb 2, 2024

feat: ability to directly write the gRPC frame to the ServerWritableStream #2652

feat: ability to directly write the gRPC frame to the ServerWritableStream #2652

Conversation

mishimastar commented Jan 30, 2024

linux-foundation-easycla bot commented Jan 30, 2024 • edited

murgatroid99 commented Jan 30, 2024

mishimastar commented Feb 1, 2024

murgatroid99 commented Feb 1, 2024

mishimastar commented Feb 2, 2024

murgatroid99 commented Feb 2, 2024

mishimastar commented Feb 2, 2024

linux-foundation-easycla bot commented Jan 30, 2024 •

edited