Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[round robin] delegate to pick_first instead of creating subchannels directly #32692

Merged
merged 84 commits into from
Jun 27, 2023

Conversation

markdroth
Copy link
Member

@markdroth markdroth commented Mar 23, 2023

More work on the dualstack backend design:

  • Change round_robin to delegate to pick_first instead of creating subchannels directly.
  • Change pick_first such that when it is the child of a petiole policy, it will unconditionally start a health watch.
  • Change the client-side health checking code such that if client-side health checking is not enabled, it will return the subchannel's raw connectivity state.
  • As part of this, we introduce a new endpoint_list library to be used by petiole policies, which is intended to replace the existing subchannel_list library. The only policy that will still directly interact with subchannels is pick_first, so the relevant parts of the subchannel_list functionality have been copied directly into that policy. The subchannel_list library will be removed after all petiole policies are updated to delegate to pick_first.

markdroth added a commit that referenced this pull request Apr 20, 2023
… LB policies (#32709)

This paves the way for making pick_first the universal leaf policy (see
#32692), which will be needed for the dualstack design. That change will
require changing pick_first to see both the raw connectivity state and
the health-checking connectivity state of a subchannel, so that we can
enable health checking when pick_first is used underneath round_robin
without actually changing the pick_first connectivity logic (currently,
pick_first always disables health checking). To make it possible to do
that, this PR moves the health checking code out of the subchannel and
into a separate API using the same data-watcher mechanism that was added
for ORCA OOB calls.
@markdroth markdroth marked this pull request as ready for review June 26, 2023 22:08
@markdroth markdroth requested a review from eugeneo June 26, 2023 22:08
LoadBalancingPolicy::UpdateArgs update_args;
update_args.addresses.emplace().emplace_back(address);
update_args.args = child_args;
// TODO(roth): If the child reports a non-OK status with the update,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it at least be logged?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this situation, there's no point in logging it, because a non-OK status isn't actually a real failure; it's really just an indication of whether the resolver should consider the update a failure. We actually have a broader pattern of this issue that occurs in many places, as introduced by #30809. When we eventually find a good solution for this, we'll fix it everywhere it occurs.

In this particular case, it's actually currently impossible for this to return non-OK anyway, because we are creating one endpoint for each address, and the only case where pick_first will return non-OK is if the address list is empty. However, that will change when we add support for multiple addresses per endpoint, so I've added a TODO in round_robin.cc to make sure that we properly handle that case.

// To use this, a petiole policy must define its own subclass of both
// EndpointList and EndpointList::Endpoint, like so:
/*
class MyEndpointList : public EndpointList {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Commented out code (I believe linter will point it out too)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commented-out code is intentional; it's showing an example of how to use this API. I'm not aware of any linter that would consider this a problem.

@markdroth markdroth merged commit 27a778f into grpc:master Jun 27, 2023
64 of 65 checks passed
@copybara-service copybara-service bot added the imported Specifies if the PR has been imported to the internal repository label Jun 27, 2023
mario-vimal pushed a commit to mario-vimal/grpc that referenced this pull request Jul 13, 2023
…directly (grpc#32692)

More work on the dualstack backend design:
- Change round_robin to delegate to pick_first instead of creating
subchannels directly.
- Change pick_first such that when it is the child of a petiole policy,
it will unconditionally start a health watch.
- Change the client-side health checking code such that if client-side
health checking is not enabled, it will return the subchannel's raw
connectivity state.
- As part of this, we introduce a new endpoint_list library to be used
by petiole policies, which is intended to replace the existing
subchannel_list library. The only policy that will still directly
interact with subchannels is pick_first, so the relevant parts of the
subchannel_list functionality have been copied directly into that
policy. The subchannel_list library will be removed after all petiole
policies are updated to delegate to pick_first.
markdroth added a commit to markdroth/grpc that referenced this pull request Jul 14, 2023
markdroth added a commit that referenced this pull request Jul 14, 2023
This reverts the following PRs: #32692 #33087 #33093 #33427 #33568

These changes seem to have introduced some flaky crashes. Reverting
while I investigate.
mario-vimal pushed a commit to mario-vimal/grpc that referenced this pull request Jul 27, 2023
This reverts the following PRs: grpc#32692 grpc#33087 grpc#33093 grpc#33427 grpc#33568

These changes seem to have introduced some flaky crashes. Reverting
while I investigate.
markdroth added a commit that referenced this pull request Aug 31, 2023
This rolls forward only the pick_first changes from #32692, which were
rolled back in #33718. Specifically:
- Changes PF to use its own subchannel list implementation instead of
using the subchannel_list library, since the latter will be going away
with the dualstack changes.
- As a result of no longer using the subchannel_list library, PF no
longer needs to set the `GRPC_ARG_INHIBIT_HEALTH_CHECKING` channel arg.
- Adds an option to start a health watch on the chosen subchannel, to be
used in the future when pick_first is the child of a petiole policy.
(Currently, this code is not actually called anywhere.)
markdroth added a commit that referenced this pull request Sep 5, 2023
…34222)

Rolls forward part of the dualstack changes, mostly from #33427 and a
little bit from #32692, both of which were reverted in #33718.
Specifically:
- For petiole policies, unconditionally start health watch on
subchannels, even if client side health checking is not enabled; in this
case, the health watch will report the subchannel's raw connectivity
state.
- Fix edge cases in health check reporting that occur when a watcher is
started before the initial state is reported.
- When client-side health checking fails, add the subchannel's address
to the RPC failure status message.
- Outlier detection now works only via the health checking watch, not
via the raw connectivity state watch.
- Remove now-unnecessary hack to ensure that outlier detection does not
work for pick_first.
markdroth added a commit that referenced this pull request Sep 11, 2023
Rolls forward the remaining changes from #32692, which were rolled back
in #33718.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bloat/low imported Specifies if the PR has been imported to the internal repository lang/c++ lang/core lang/Python per-call-memory/neutral per-channel-memory/neutral release notes: no Indicates if PR should not be in release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants