Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client: fix race between client-side stream cancellation and compressed server data arriving #3054

Merged
merged 3 commits into from Oct 1, 2019

Conversation

dfawley
Copy link
Member

@dfawley dfawley commented Sep 27, 2019

Fixes #3053

transport/Stream.RecvCompress returns what the header contains, if present,
or empty string if a context error occurs. However, it "prefers" the header
data even if there is a context error, to prevent a related race. What happens
here is:

  1. RPC starts.

  2. Client cancels RPC.

  3. RecvCompress tells ClientStream.Recv that compression used is "" because
    of the context error. as.decomp is left nil, because there is no
    compressor to look up in the registry.

  4. Server's header and first message hit client.

  5. Client sees the header and message and allows grpc's stream to see them.
    (We only provide context errors if we need to block.)

  6. Client performs a successful Read on the stream, receiving the gzipped
    payload, then checks as.decomp.

  7. We have no decompressor but the payload has a bit set indicating the message
    is compressed, so this is an error. However, when forming the error string,
    RecvCompress now returns "gzip" because it doesn't need to block to get
    this from the now-received header. This leads to the confusing message
    about how "gzip" is not installed even though it is.

This change makes waitOnHeader close the stream when context cancellation happens.
Then RecvCompress uses whatever value is present in the stream at that time, which
can no longer change because the stream is closed. Also, this will be in sync with
the messages on the stream - if there are any messages present, the headers must
have been processed first, and RecvCompress will contain the proper value.

…ed server data arriving

`transport/Stream.RecvCompress` returns what the header contains, if present,
or empty string if a context error occurs.  However, it "prefers" the header
data even if there is a context error, to prevent a related race.  What happens
here is:

1. RPC starts.

2. Client cancels RPC.

3. `RecvCompress` tells `ClientStream.Recv` that compression used is "" because
   of the context error.  `as.decomp` is left nil, because there is no
   compressor to look up in the registry.

4. Server's header and first message hit client.

5. Client sees the header and message and allows grpc's stream to see them.
   (We only provide context errors if we need to block.)

6. Client performs a successful `Read` on the stream, receiving the gzipped
   payload, then checks `as.decomp`.

7. We have no decompressor but the payload has a bit set indicating the message
   is compressed, so this is an error.  However, when forming the error string,
   `RecvCompress` now returns "gzip" because it doesn't need to block to get
   this from the now-received header.  This leads to the confusing message
   about how "gzip" is not installed even though it is.

This change makes `RecvCompress` return an error instead of empty string (which
is also a valid response), and makes `ClientStream.Recv` return an error when
this happens.  This effectively terminates the stream and prevents subsequent
operations.  This results in 10k/10k passing runs (previously observed failure
rate of ~1/100).
@dfawley dfawley added this to the 1.25 Release milestone Sep 27, 2019
@dfawley dfawley assigned menghanl and Vizerai and unassigned Vizerai Sep 27, 2019
internal/transport/transport.go Outdated Show resolved Hide resolved
test/context_canceled_test.go Outdated Show resolved Hide resolved
internal/transport/transport.go Outdated Show resolved Hide resolved
internal/transport/transport.go Outdated Show resolved Hide resolved
@dfawley dfawley merged commit 663e4ce into grpc:master Oct 1, 2019
@dfawley dfawley deleted the flake branch October 1, 2019 17:48
@lock lock bot locked as resolved and limited conversation to collaborators Apr 2, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flaky test: CancelWhileRecvingWithCompression
3 participants