Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rls: delegate pick to child policy as long as it is not in TransientFailure #5656

Merged
merged 2 commits into from Sep 15, 2022

Conversation

easwars
Copy link
Contributor

@easwars easwars commented Sep 14, 2022

Recently, we implemented a no-op ExitIdle method for the ringhash LB policy in #5614.

When RLS is used in conjunction with ringhash though, this causes the channel to never come out of IDLE, as ringhash starts in IDLE and any requests to exit idle from its parent results in a no-op.

The pick behavior in RLS though seems to be the root cause. Instead of delegating the picks to child policies which are not in TransientFailure, RLS was asking them to exit idle if they were in IDLE. This is in contrast to how things are implemented in c-core, which simply delegates the picks to the child policies as long as they are not in TransientFailure, and for a policy like ringhash, the pick ensures that it exits idle.

RELEASE NOTES:

  • rls: fix a bug which was causing the channel to be stuck in IDLE

@easwars easwars added this to the 1.50 Release milestone Sep 14, 2022
// to which this RPC can be routed to. If there is no child policy in READY
// state, we delegate to the first child policy arbitrarily.
// to which this RPC can be routed to. If all child policies are in
// TRANSIENT_FAILURE, we delegate to the first child policy arbitrarily.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we delegate to the last child policy to match C++? (Change itself should be ~trivial; e.g. check if loop iterator (currently _) is len(dcEntry.childPolicyWrappers)-1, in which case don't continue even if TRANSIENT_FAILURE.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it when making the change, but was lazy to make it and thought you wouldn't ask for it ... lol

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@easwars easwars merged commit 9c3e589 into grpc:master Sep 15, 2022
1 check passed
@easwars easwars deleted the e2e_with_ring_hash branch September 15, 2022 22:55
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 15, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants