New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sending large complex message limits throughput #1879
Comments
My understanding is that you want to do message encoding on your own, and send the encoded bytes directly. One trivial way to do this to define proto messages that contain bytes only. There will still be some encoding going on in protobuf, but the overhead should be low. Another way is to create and register your byte slice codec (how to use codec), the codec you will define should be simiar to this. For example: bytes := encode(msg)
stream.(grpc.ClientStream).SendMsg(&bytes) |
Thanks @menghanl is there any other people that have done this? Maybe this isn't really the problem. When I close the Stream from the other side, my benchmark code speeds up dramatically. Is there no notion of buffering on SendMsg or is it tying up the machine. I will try marshalling to bytes and see if this makes it faster in testing. |
Okay, I implemented a custom codec that overrides the proto codec. Basically if it detects a []byte it will pass it through untouched, if it's anything else it sends it to the real proto codec. This did have some effect but not really as much as I was hoping. It just seems like there's a limit to the throughput of the stream. Is this normal? Should I only expect so much and open multiple streams? |
I'm streaming lots of 4k messages. Do I need to pack more into the same message in order to make it faster? I was hoping using a stream would negate the need for larger messages. The encoding/compression also seems to cause performance issues as if I enable compression I get approximately a 20% decrease in message throughput. I looked at the code for a little bit.. I'm certainly not an expert and I'm sure it's not this simple (it never is) but would it be possible to make Send thread safe, handle message marshaling/compressing asynchronously and have maybe a small channel/buffer such that messages can get queued up as to send them more efficiently on the wire? |
To illustrate my point, I did some more testing (I can provide code examples if you need) but basically I created a simple bidirectional stream and the message had one field which was a single google.protobuf.Value. I opened the stream and the server sent that single message as fast as it could. I used the jsonpb library to fill that value with a single string (from JSON) that when marshalled was about 4k. I was able to send this message at 36k times per second. I then used the jsonpb library to fill that value with a complex struct generated from https://www.json-generator.com/. When converted to protobuf(bytes) it was also about 4k. I was only able to send this message about 3.5k times per second. Same message size on the wire, 10 times slower. This is having dramatic effect on my throughput. It even seems like if I open more than one connection it really doesn't get substantially faster. I have also opened multiple connections to the same server to try to increase the speed however beyond about 2 connections it doesn't make any difference in throughput. With a complex protobuf message generating about a 4.3k message (after marshaling to []bytes) , I can only hit about 6k mps or 25 MB/s from a client/server on the same host over multiple streams. |
Hey @snowzach can you benchmark how long does it take to marshal/unmarshal both kinds of messages that you have? From gRPC's point of view, performance should same for a 4k byte message. If the bottleneck boils down to the cost of serialization and deserialization then it might be a more suitable question for the protobuf team. As a workaround, you may try the following if it helps:
|
@MakMukhi I have been playing around with exactly what you suggest and it makes a huge difference in the performance. I get that the protobuf I am using is complicated. I guess the point I am trying to make is that the Send function of GRPC shouldn't be single threaded (for a stream especially) if it's going to include serialization (or compression) I also have the same problem with compression. Since it takes a small amount of time to compress the message, my message per second throughput drops dramatically and I suspect it's because the send queue is getting starved for data. If Send was multi threaded and handled everything it could (encoding/compression) simultaneously and then send the data, it could be much faster. |
@snowzach - I've implemented a change because I've observed the same behaviour - hoping it gets picked up. What I found was implementing locking around Send() bottlenecked because of the compression and serialisation in send - I found this particularly vexing for high volume bi-directional streaming. The change is at #2356 (Issue #2355 ) if you're keen to take a look and see if there's any commonalities in our respective situations. |
@steve-gray I ended up just implementing my own stream wrapper along with my own intelligent protobuf codec. Essentially, my protobuf codec, when handed []byte will pass it through making the assumption it has already been marshaled from a protobuf struct to []byte. Then, I implemented a buffered wrapper that wraps the send function and will handle marshaling and unmarshaling protobuf structs to/from []byte in parallel and then using SendMsg to move them on. Using this, I've massively increased the throughput. |
@snowzach, @steve-gray, @suyashkumar, or anyone stumbling on this issue: We have an experimental feature implemented in #2560 (from the proposal in #2432) that may alleviate the related concerns here, and we'd be happy if you could test it out. Specifically, we're interested in:
|
This issue is labeled as requiring an update from the reporter, and no update has been received after 7 days. If no update is provided in the next 7 days, this issue will be automatically closed. |
@dfawley here's a different use case for which I can provide feedback: we have an embedded device that publishes sensor data as a server-streaming RPC. A microcontroller reads the sensors, serializes the values into a protobuf message and writes the result on a serial interface. The gRPC server (running on a single-core CPU) reads the messages from the serial port and passes them to all connected clients. Unfortunately the server wastes a lot of resources doing unnecessary work:
This limits the frequency at which the server can publish sensor data. The PreparedMsg API goes in the right direction for us but fails on the following points:
It would be great if there was a way to wrap existing serialized data in a PreparedMsg (and support for sending PreparedMsg from the server). |
@knuesel you should look into a custom codec instead of PreparedMsg for this use case. https://github.com/grpc/grpc-go/blob/master/Documentation/encoding.md If you need the same server to be able to handle both pre-encoded and un-encoded data, the codec could first do a type assertion. If the message passed in is already a |
@dfawley I did make a quick attempt at using a custom codec on the server, but found out that I would need to implement the codec in the clients too. That would be significant work in our case as we have many different clients, currently in C++, Go, Dart, Python, Java and C#. Certainly doable but a lot of code to write/maintain for an optimization that only concerns the server. Hence my interest in the PreparedMsg solution :-). |
You shouldn't need to implement a custom codec on the client, as long as the message is of the right type & encoding when it is sent from the server. If you're worried about the name of the codec affecting the client's behavior, you should override the "proto" codec on the server. |
@knuesel check out https://github.com/snowzach/protosmart If you have trouble with it, let me know. I'll try to help. Like I said, it's been a while since I used it. |
@dfawley many thanks! It works indeed when I override the proto codec using the |
@snowzach this looks very nice, thanks! I'll try it as soon as possible. |
I think this can be resolved. If there is more to do here, please let us know. |
Please answer these questions before submitting your issue.
What version of gRPC are you using?
1.10
What version of Go are you using (
go version
)?1.9.2
What operating system (Linux, Windows, …) and version?
Linux
What did you do?
I have a bidirectional stream opened sending large complicated messages.
What did you expect to see?
Super Fast
What did you see instead?
Only kinda fast
So I've been looking at the code and for streams you can only call Send() one at a time. The message I am sending is very large and complex (includes things like Structs with many levels). As part of the Send operations it's encoding the message which will take some time. Meanwhile, I cannot call Send with any other threads so my operations look like Send, Encode, Send, Encode..
I think this may be severely limiting how much data I can send with the grpc stream. It would be nice if I could somehow bypass the encoding stage (so I can do it in parallel) and then when I call send it's literally only sending the message and not tying up the sender with encoding.
Perhaps if there was some mechanism to serialize the send operation such that Send was multi-threaded until it could not be so encoding and compression could be handled in parallel?
The text was updated successfully, but these errors were encountered: