Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transport/http2_server : Move up streamID validation in operate headers #4873

Merged
merged 11 commits into from Nov 2, 2021

Conversation

uds5501
Copy link
Contributor

@uds5501 uds5501 commented Oct 14, 2021

Fixes : #4779

Moved stream ID validation as mentioned in the issue and ensured illegal streamIDs are not used for new stream object creation.

RELEASE NOTES:

  • TBD

@uds5501
Copy link
Contributor Author

uds5501 commented Oct 14, 2021

@menghanl @easwars can you please help me in assigning milestones and labels to this PR?

@uds5501 uds5501 changed the title [Fix] Move up streamID validation in operate headers transport/http2_server : Move up streamID validation in operate headers Oct 14, 2021
@easwars easwars added this to the 1.42 Release milestone Oct 14, 2021
@uds5501
Copy link
Contributor Author

uds5501 commented Oct 15, 2021

@easwars @menghanl Please initiate the GitHub runner workflows.

@uds5501
Copy link
Contributor Author

uds5501 commented Oct 16, 2021

@easwars Could you please start the tests and review the PR? Thank you :D

easwars
easwars previously approved these changes Oct 18, 2021
internal/transport/http2_server.go Outdated Show resolved Hide resolved
@easwars
Copy link
Contributor

easwars commented Oct 18, 2021

Moving to @dfawley for another look. The changes looked OK to me. But I couldn't find an existing test to which we could add a case for this change.

@uds5501
Copy link
Contributor Author

uds5501 commented Oct 18, 2021

@easwars @dfawley
Updates :

  • The maxStreamID was being updated only when processing headers so I removed the locks and started treating it as an atomic var. instead.
  • Extra comments removed

Please run the tests and review, thanks!

@dfawley
Copy link
Member

dfawley commented Oct 18, 2021

The maxStreamID was being updated only when processing headers so I removed the locks and started treating it as an atomic var. instead.

Did you forget to push this commit? I still see the lock being used.

Copy link
Member

@dfawley dfawley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually there is another interaction here that we need to be careful about.

If an outgoing GOAWAY happens at the same time as we are processing an incoming header, we need to make sure that we are consistent: either we reject the stream and we make sure the GOAWAY stream ID includes it as unprocessed, or we process the stream and indicate that the GOAWAY did not omit it.

(I am pretty sure this will require a lock.)

@uds5501
Copy link
Contributor Author

uds5501 commented Oct 18, 2021

@dfawley I actually did forget to push the atomic commits (Few tests were locally failing because of that commit so didn't push).
Also, yes you are right we are going to need a lock for that particular use case.

For now, I have just put the unlock before the assignment of the buffer.

@uds5501 uds5501 requested a review from dfawley October 18, 2021 18:44
@dfawley
Copy link
Member

dfawley commented Oct 18, 2021

For now, I have just put the unlock before the assignment of the buffer.

I believe this still has the race, then. It was avoided before by checking the state of the transport after taking the lock and before deciding whether to allow the stream ID. Now the checks have been broken apart with an unlock between them.

@uds5501
Copy link
Contributor Author

uds5501 commented Oct 19, 2021

@dfawley

It was avoided before by checking the state of the transport after taking the lock and before deciding whether to allow the stream ID.

Right now, I am taking a lock, validating the stream ID, and unlocking. If stream ID is valid then we unlock and a new stream Object is created. If not, it returns the fatal bool. So, should there be a race in this state?

Now the checks have been broken apart with an unlock between them.

Didn't understand this. The checks seem to be passing alright, is there something I am missing?

@dfawley
Copy link
Member

dfawley commented Oct 19, 2021

The checks seem to be passing alright, is there something I am missing?

EDIT: the "checks" I meant were "if conditions", not "our tests". Sorry for the confusion.

Consider:

  1. We receive a new RPC & start processing headers.
  2. We allow the stream ID to pass due to this check:
    https://github.com/uds5501/grpc-go/blob/00f199d6456730b6a5614529c19343daa7b0b83b/internal/transport/http2_server.go#L352
    But then we give up the lock and delay for a moment (races happen).
  3. We send an outgoing GOAWAY and get to here:
  4. We then set the state of the transport to draining, but the GOAWAY indicates we will process the RPC.
  5. We get here and then fail the RPC because of the transport's state:
    if t.state != reachable {

@dfawley
Copy link
Member

dfawley commented Oct 19, 2021

FWIW I believe the fix is as simple as including the transport state check along with the (now earlier) stream ID check, but I may be missing something else that makes that incorrect for another reason.

@uds5501
Copy link
Contributor Author

uds5501 commented Oct 19, 2021

FWIW I believe the fix is as simple as including the transport state check along with the (now earlier) stream ID check, but I may be missing something else that makes that incorrect for another reason.

@dfawley I was thinking on similar lines.
I had one more idea though, what if we keep the transport state where it is and add another layer that if state is draining and maxUserId == current stream Id then it should still process this stream. [I am assuming that after giving GOAWAY we are not accepting any more streams?]

Ultimately making that check to be -

if t.state != reachable && !(t.state == draining && maxStreamID == streamID) {
    t.mu.Unlock()
    s.cancel()
    return false
}

This can keep the check right there if it's really important.
Otherwise clubbing streamId and transport state validation doesn't seem to be a bad idea either (at least from the surface level)

@dfawley
Copy link
Member

dfawley commented Oct 19, 2021

I had one more idea though, what if we keep the transport state where it is and add another layer that if state is draining and maxUserId == current stream Id then it should still process this stream. [I am assuming that after giving GOAWAY we are not accepting any more streams?]

That doesn't seem right. What would stop us from continually accepting new streams once we are in the draining state, since we update maxStreamID on every header we receive?

I just looked, and we can't simply move the state check. If we give up the lock and take it again later, we don't want to then later add the stream ID to activeStreams without first checking the transport state again.

What if we do the stream ID check above, like now, but make the assignment to t.maxStreamID only after checking the state while holding the lock the second time (or on an intervening error)?

@dfawley
Copy link
Member

dfawley commented Oct 21, 2021

Just one small doubt, in handling the GOAWAY, would it be contesting for nested locks?? First the transport layer lock and then to get the sid, this max stream ID lock?

Yes. You will need to make sure you always take t.maxStreamMu without holding t.mu already. If you do that, you know that if you are able to take t.maxStreamMu, you will eventually be able to take t.mu. So, I believe you'll need to change the outgoing goaway function to take t.maxStreamMu at the top.

@uds5501
Copy link
Contributor Author

uds5501 commented Oct 22, 2021

@dfawley completed the changes as requested.

internal/transport/http2_server.go Outdated Show resolved Hide resolved
internal/transport/http2_server.go Outdated Show resolved Hide resolved
@uds5501
Copy link
Contributor Author

uds5501 commented Oct 26, 2021

@dfawley completed the changes as requested.

@uds5501 uds5501 requested a review from dfawley October 26, 2021 11:38
Copy link
Member

@dfawley dfawley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I think this is almost done. It's too bad a decent test for this race would be ~impossible to write.

internal/transport/http2_server.go Show resolved Hide resolved
internal/transport/http2_server.go Show resolved Hide resolved
internal/transport/http2_server.go Outdated Show resolved Hide resolved
@uds5501
Copy link
Contributor Author

uds5501 commented Oct 26, 2021

@dfawley thank you for your patient reviews, I have completed the changes as requested.

@uds5501 uds5501 requested a review from dfawley October 26, 2021 16:17
@uds5501 uds5501 requested a review from dfawley October 26, 2021 17:58
Copy link
Member

@dfawley dfawley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@dfawley dfawley assigned menghanl and unassigned uds5501 Oct 26, 2021
@dfawley dfawley requested a review from menghanl October 26, 2021 21:40
@dfawley
Copy link
Member

dfawley commented Oct 26, 2021

Let's wait until the branch cut for 1.42 before merging this PR, since the release is very soon and this is a semi-risky change without anyone specifically asking for it.

@uds5501
Copy link
Contributor Author

uds5501 commented Oct 27, 2021

@dfawley understandable, I wanted to know what's the general order in which issues/features are picked to be implemented? Is there a separate list altogether or they are all there in the issues section of Github?

@dfawley
Copy link
Member

dfawley commented Oct 27, 2021

Everything on our radar is in the issues (or is a bigger project for the team). The "help wanted" label is the best place to look if you are looking for other things to contribute. Thanks!

@menghanl menghanl modified the milestones: 1.42 Release, 1.43 release Oct 29, 2021
@dfawley dfawley merged commit 670c133 into grpc:master Nov 2, 2021
@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 2, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Errors in operateHeaders should count stream ID for the purposes of new stream ID validation
4 participants