New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core: make subchannel creation timing restriction stricter #7790
core: make subchannel creation timing restriction stricter #7790
Conversation
… subchannel creation if the channel is being shutting down and the delayed transport is terminated (aka, all retry calls has been finished).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is pretty safe for the createSubchannel(CreateSubchannelArgs)
method. But it is racy for createSubchannel(List<EquivalentAddressGroup>, Attributes)
. You might investigate when we deprecated the old method; it might be an appropriate time to delete it.
It doesn't seem to be racy, as |
More like, "there is not a way for the caller of the API to use the method correctly." Yes, it will not be a data race. But it will still be a race because it will non-deterministically throw an exception. |
…e_stricter_subchannel_creation_timing
#7793 has been merged. PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be really helpful to include additional details about why we are making this change. Like referencing back to the exception we saw. Why we make a change is generally at least as important as what change is being made.
It seems we should also simplify SubchannelImpl. |
Hmm... That kind of cleanup could (should) have been done in #7793. Let's do that in a separate PR and make this change easily noticeable. |
How do you figure? It seems directly related to us changing |
Oh you are talking about prohibiting start() (I thought you were saying what's being mentioned in the TODO comment to remove volatile). Sure, we do the same for Subchannel's start(). This further syncs subchannel lifecycle with load balancer state. |
I will create a separate PR for cleanups left in #5015 (comment) then we can close that issue. |
Throw for subchannel creation if the channel is being shutting down and the delayed transport is terminated (aka, all retry calls has been finished). This enforces load balancer implementations to avoid creating subchannels after being shut down.
Throw for subchannel creation if the channel is being shutting down and the delayed transport is terminated (aka, all retry calls has been finished).
This enforces a tighter timing restriction on load balancer implementations to ensure no subchannel should be created after the balancer itself has been shut down. A load balancer implementation may not always check its state before creating new subchannels. They can be out of sync when the subchannel creation is queued and executed later. We had run into exceptions below in xDS where PriorityLoadBalancer enqueues addresses propagation to its child policy (to avoid reentrancy by
updateBalancingState()
upcalls) while the Channel (and therefore the LB tree) is shut down before it gets executed. Although we've fix existing LB implementations to not delay addresses propagation (instead enqueue upcalls if necessary), checking the consistency at LoadBalancer.Helper and making it throw if the load balancer had been shut down can help such bugs in load balancer implementations surface earlier.