-
Notifications
You must be signed in to change notification settings - Fork 10.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[round robin] delegate to pick_first instead of creating subchannels directly #32692
Conversation
…nt_data_watcher_new
…nt_data_watcher_new
…nt_data_watcher_new
…nt_data_watcher_new
…nt_data_watcher_new
… LB policies (#32709) This paves the way for making pick_first the universal leaf policy (see #32692), which will be needed for the dualstack design. That change will require changing pick_first to see both the raw connectivity state and the health-checking connectivity state of a subchannel, so that we can enable health checking when pick_first is used underneath round_robin without actually changing the pick_first connectivity logic (currently, pick_first always disables health checking). To make it possible to do that, this PR moves the health checking code out of the subchannel and into a separate API using the same data-watcher mechanism that was added for ORCA OOB calls.
…_ejection_via_health_watch
LoadBalancingPolicy::UpdateArgs update_args; | ||
update_args.addresses.emplace().emplace_back(address); | ||
update_args.args = child_args; | ||
// TODO(roth): If the child reports a non-OK status with the update, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it at least be logged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this situation, there's no point in logging it, because a non-OK status isn't actually a real failure; it's really just an indication of whether the resolver should consider the update a failure. We actually have a broader pattern of this issue that occurs in many places, as introduced by #30809. When we eventually find a good solution for this, we'll fix it everywhere it occurs.
In this particular case, it's actually currently impossible for this to return non-OK anyway, because we are creating one endpoint for each address, and the only case where pick_first will return non-OK is if the address list is empty. However, that will change when we add support for multiple addresses per endpoint, so I've added a TODO in round_robin.cc to make sure that we properly handle that case.
// To use this, a petiole policy must define its own subclass of both | ||
// EndpointList and EndpointList::Endpoint, like so: | ||
/* | ||
class MyEndpointList : public EndpointList { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Commented out code (I believe linter will point it out too)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This commented-out code is intentional; it's showing an example of how to use this API. I'm not aware of any linter that would consider this a problem.
…directly (grpc#32692) More work on the dualstack backend design: - Change round_robin to delegate to pick_first instead of creating subchannels directly. - Change pick_first such that when it is the child of a petiole policy, it will unconditionally start a health watch. - Change the client-side health checking code such that if client-side health checking is not enabled, it will return the subchannel's raw connectivity state. - As part of this, we introduce a new endpoint_list library to be used by petiole policies, which is intended to replace the existing subchannel_list library. The only policy that will still directly interact with subchannels is pick_first, so the relevant parts of the subchannel_list functionality have been copied directly into that policy. The subchannel_list library will be removed after all petiole policies are updated to delegate to pick_first.
…hannels directly (grpc#32692)" This reverts commit 27a778f.
This reverts the following PRs: grpc#32692 grpc#33087 grpc#33093 grpc#33427 grpc#33568 These changes seem to have introduced some flaky crashes. Reverting while I investigate.
This rolls forward only the pick_first changes from #32692, which were rolled back in #33718. Specifically: - Changes PF to use its own subchannel list implementation instead of using the subchannel_list library, since the latter will be going away with the dualstack changes. - As a result of no longer using the subchannel_list library, PF no longer needs to set the `GRPC_ARG_INHIBIT_HEALTH_CHECKING` channel arg. - Adds an option to start a health watch on the chosen subchannel, to be used in the future when pick_first is the child of a petiole policy. (Currently, this code is not actually called anywhere.)
…34222) Rolls forward part of the dualstack changes, mostly from #33427 and a little bit from #32692, both of which were reverted in #33718. Specifically: - For petiole policies, unconditionally start health watch on subchannels, even if client side health checking is not enabled; in this case, the health watch will report the subchannel's raw connectivity state. - Fix edge cases in health check reporting that occur when a watcher is started before the initial state is reported. - When client-side health checking fails, add the subchannel's address to the RPC failure status message. - Outlier detection now works only via the health checking watch, not via the raw connectivity state watch. - Remove now-unnecessary hack to ensure that outlier detection does not work for pick_first.
More work on the dualstack backend design: