-
Notifications
You must be signed in to change notification settings - Fork 10.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
grpclb: implement subchannel caching #27657
Conversation
|
||
// Deleted subchannel caching. | ||
const grpc_millis subchannel_cache_interval_ms_; | ||
std::map<grpc_millis /*deletion time*/, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if two subchannels are deleted at the same millisecond?
And why are we reimplementing a timer queue here?
I'm assuming what we ultimately want to do is just keep the subchannel reference around for a period of time, so ultimately we could write:
void Cache(RefCountedPtr<SubchannelInterface> p) {
event_engine->RunAt(now() + 10s, [p](){});
}
Maybe we could find a way to write it similar to that with the current API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value of the map is a vector, so if there are multiple subchannels deleted in the same millisecond, they'll go in the same map entry. That's intentional, and in fact I expect it to happen very frequently due to the cached value of "now" in the ExecCtx
.
The workflow is basically this:
- The
grpclb
policy gets an update from the balancer that does not include one or more addresses that were in the previous update. - The
grpclb
policy sends the updated address list to theround_robin
child policy. - The
round_robin
policy calls the helper'sCreateSubchannel()
for every address in the new list, and then as soon as it unrefs the subchannels from the previous list. (Note that the same cached value of "now" inExecCtx
is used for all of these unrefs.) - As each subchannel is unreffed, it gets added to
cached_subchannels_
(in the same bucket, because the same value of "now"), and a timer is started when the first one is added.
The idea here is to minimize the number of timers and therefore the amount of memory used for the cache, since we know that there can be multiple subchannels removed in the same update from the balancer, and we know that another update is likely to come in (which may remove another set of subchannels) before the timer fires for the subchannels removed in the previous update (the balancer may send updates as often as every 1s, but we cache subchannels for 10s). This way, we basically have just one timer pending at any given time, no matter how many subchannels are cached.
I could instead structure this using a separate timer for each subchannel, but that would increase the amount of memory I'd have to store for each cached subchannel: instead of just the ref to the subchannel, I'd also need to store a timer and a closure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[none of this should be blocking]
Yup ok... mostly trying to make the event engine conversion easier, since I expect the code I wrote above would be preferred there, but as expressed we'll probably end up keeping the infrastructure here and making more complicated code.
I'm not sure that conservation of timers or memory warrants the additional long term complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given how expensive memory is right now, it seemed worth the optimization. But I acknowledge that I have absolutely no data to justify it; it's just sort of a hunch. It didn't seem that hard to do it this way, so I figured I might as well do it. But if at some point it is causing problems, it also isn't hard to change it to work the other way.
I don't think this will affect the EventEngine conversion either way.
} | ||
|
||
void GrpcLb::OnSubchannelCacheTimerLocked(grpc_error_handle error) { | ||
if (subchannel_cache_timer_pending_ && error == GRPC_ERROR_NONE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like we need to reset subchannel_cache_timer_pending_
in here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Done.
// subchannel caching | ||
// | ||
|
||
void GrpcLb::CacheDeletedSubchannel( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like StartSubchannelCacheTimer
and this method need to be synchronized the same way as OnSubchannelCacheTimerLocked
, in order to safely access cached_subchannels_
. So let's suffix these methods with Locked
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
lb_token_(std::move(lb_token)), | ||
client_stats_(std::move(client_stats)) {} | ||
|
||
~SubchannelWrapper() override { | ||
if (!lb_policy_->shutting_down_) { | ||
lb_policy_->CacheDeletedSubchannel(wrapped_subchannel()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could get rid of the timer loop in the grpclb policy if we just had this dtor allocate its own object which holds a closure, and a timer which fires in GRPC_GRPCLB_DEFAULT_SUBCHANNEL_DELETION_DELAY_MS
from now (this object would destroy itself and the unref the subchannel when the timer fired).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do that, but we'd still need to keep track of all of the cached subchannels that are pending deletion, because we need to be able to cancel any pending timers when the LB policy shuts down. So I think this would increase memory usage without any real benefit, as per my reply to Craig below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to be able to cancel any pending timers when the LB policy shuts down
Just for the thought experiment, what happens if we don't try to shut down these timers when the LB policy shuts down? I wonder if the per-subchannel timer approach could be made simpler this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would cause memory leaks on grpc shutdown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
grpc shutdown should cancel all pending timers globally, though, right? Is that not sufficient to prevent this?
The other thing I'm thinking about here is that one of more of these subchannels may be in the process of setting up a TCP connection, and TCP connection setup can't be cancelled anyways AFAIK.
Is this potential leak with the cached subchannel timers different from the case where TCP connections are still in the process of establishing and grpc shutdown is called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the correct behavior for any code is to cancel any async work that it has pending when it shuts down. I would consider not doing that to be a bug.
I'm not sure what happens in the case where the subchannel has a pending TCP connection setup; it may be that we have a bug there. But even if we do, I don't think that's a justification for adding a new bug here.
<< "backend " << i; | ||
} | ||
} | ||
// TODO(roth): This should ideally check that backend 1 never lost its |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can check that backend 1 never lost its connection by checking that it only received RPCs from one peer IP:port
EXPECT_EQ(1UL, backends_[1]->service_.clients().size());
- like above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. Done.
DEBUG_LOCATION); | ||
} | ||
|
||
void GrpcLb::OnSubchannelCacheTimerLocked(grpc_error_handle error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like error is missing an unref here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Fixed.
Known issues: #27711 |
* grpclb: implement subchannel caching * code review changes * fix clang tidy * code review changes
This is something that we always should have done but never quite got around to, and we have reports of the subchannel churn causing problems for internal users.