Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow setting max received message size #658

Open
theoribeiro opened this issue May 5, 2023 · 7 comments · May be fixed by #738
Open

Allow setting max received message size #658

theoribeiro opened this issue May 5, 2023 · 7 comments · May be fixed by #738

Comments

@theoribeiro
Copy link

I'm getting an error whenever trying to upload a cache using pants with the error message below:
ResourceExhausted: "grpc: received message larger than max (6909987 vs. 4194304)"

This happens because bazel-remote keeps the default grpc max received message size and does not have a way to configure it. I can confirm that by setting MaxRecvMsgSize to more than 4MB as an option on the server fixes the issue.

Would you be open for a PR adding that as configurable setting?

@mostynb
Copy link
Collaborator

mostynb commented May 5, 2023

Hmm, that's a pretty large grpc message for pants to be sending.

I wonder if bazel-remote were to report 4MB instead of 0/unlimited in CacheCapabilities.MaxBatchTotalSizeBytes, would pants only sent messages under that size limit? If so that might be a better option.

(Also, I wonder if you would consider submitting some documentation for how to setup pants to use bazel-remote?)

@theoribeiro
Copy link
Author

Very interesting point, @mostynb. I know that pants is using tonic, a Rust grpc client implementation but as far as I could briefly see from the source code, it doesn't look like they're doing anything exotic in terms of the message sizes.

However, I just tried setting MaxBatchTotalSizeBytes: to 4194304 at server/grpc.go#121 but I still get the same error.

Setting up pants to work with bazel-remote is pretty straight forward. A simple pants.toml config file in your project root will do:

# pants.toml

[GLOBAL]
remote_store_address = "grpc://localhost:9092"
remote_cache_read = true
remote_cache_write = true

@mostynb
Copy link
Collaborator

mostynb commented May 6, 2023

Thanks for the config example.

I think this is something that the client and server should be able to negotiate automatically. The client should query the server's capabilities to get the maximum size for a grpc message it will accept. If the returned result is 0 then the client should choose a reasonable value (no larger than 4Mb).

This feels like a pants bug that should be fixed there, unless I misunderstand what's supposed to happen when the server returns 0. Since you tested the server returning 4Mb explicitly, I'm leaning towards this being a pants bug. In the meantime, pants has an option that you could try using to control this: --remote-store-batch-api-size-limit=<int>
https://www.pantsbuild.org/docs/reference-global#remote_store_batch_api_size_limit - does using that work?

@theoribeiro
Copy link
Author

I'm leaning towards this being a pants bug. In the meantime, pants has an option that you could try using to control this: --remote-store-batch-api-size-limit=
https://www.pantsbuild.org/docs/reference-global#remote_store_batch_api_size_limit - does using that work?

Nope, that doesn't work.

I've had a long conversation in the past few days with the remote cache maintainer for pants and we tried debugging as much as we could. It SEEMS pants is doing everything correctly, It is respecting the remote_store_batch_api_size_limit and using the streaming api instead. We can't find where the ResourceExhausted message is coming from.

I tried setting

$ export GRPC_GO_LOG_VERBOSITY_LEVEL=99
$ export GRPC_GO_LOG_SEVERITY_LEVEL=info

before running bazel-remote to try to see any errors on the server side but it doesn't spit out anything.

What's weird is that when MaxRecvMsgSize is set to the default 4MB, pants doesn't seem to send any packages over the 4MB size when we inspect and decode the traffic with Wireshark.

Would love if you could enlighten us with some ideas on what to look for.

@mostynb
Copy link
Collaborator

mostynb commented May 9, 2023

It sounds like pants is sending a grpc message that is too large, and the grpc library on the bazel-remote side returns a ResourceExhausted error with the received message larger than max message: https://github.com/grpc/grpc-go/blob/master/rpc_util.go#L611

I don't think those GRPC_GO_* logging/verbosity settings will help much, because the logic you want to debug is on the client side. ie, why is pants sending such a large message in the first place?

Perhaps you could try to save and inspect this large grpc message? Is it possible to extract that from wireshark, or alternatively use some sort of grpc debug proxy to same the message? If you can figure out what kind of message it is, that might help narrow down where the message could be coming from in pants.

If you have a test case that reprodues this error, I could potentially try it out locally.

@theoribeiro
Copy link
Author

I just tried forcing tonic to set a maximum message size on the client and I'm still getting the same error. I'll try to setup a repro for you this week. Hopefully you can take a look and try to help me figure out what's going on!

So far as a workaround I've been running a bazel-remote fork setting MaxRecvMsgSize without any further issues.

@jasonwbarnett jasonwbarnett linked a pull request Mar 15, 2024 that will close this issue
@jasonwbarnett
Copy link

jasonwbarnett commented Mar 15, 2024

@mostynb I dug deeper into this issue (documented here: pantsbuild/pants#20674). I believe allowing folks to tune the gRPC server is valuable, even if it isn't absolutely necessary; as you pointed out, there are technically other ways to solve this on the client side.

I opened up #738 with a few tweaks to the work @theoribeiro did in their fork.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants