Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grpclb: implement subchannel caching #27657

Merged
merged 5 commits into from
Oct 13, 2021

Conversation

markdroth
Copy link
Member

This is something that we always should have done but never quite got around to, and we have reports of the subchannel churn causing problems for internal users.

@markdroth markdroth added the release notes: no Indicates if PR should not be in release notes label Oct 8, 2021
@markdroth markdroth requested a review from apolcyn October 8, 2021 16:02

// Deleted subchannel caching.
const grpc_millis subchannel_cache_interval_ms_;
std::map<grpc_millis /*deletion time*/,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if two subchannels are deleted at the same millisecond?
And why are we reimplementing a timer queue here?

I'm assuming what we ultimately want to do is just keep the subchannel reference around for a period of time, so ultimately we could write:

void Cache(RefCountedPtr<SubchannelInterface> p) {
  event_engine->RunAt(now() + 10s, [p](){});
}

Maybe we could find a way to write it similar to that with the current API?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value of the map is a vector, so if there are multiple subchannels deleted in the same millisecond, they'll go in the same map entry. That's intentional, and in fact I expect it to happen very frequently due to the cached value of "now" in the ExecCtx.

The workflow is basically this:

  1. The grpclb policy gets an update from the balancer that does not include one or more addresses that were in the previous update.
  2. The grpclb policy sends the updated address list to the round_robin child policy.
  3. The round_robin policy calls the helper's CreateSubchannel() for every address in the new list, and then as soon as it unrefs the subchannels from the previous list. (Note that the same cached value of "now" in ExecCtx is used for all of these unrefs.)
  4. As each subchannel is unreffed, it gets added to cached_subchannels_ (in the same bucket, because the same value of "now"), and a timer is started when the first one is added.

The idea here is to minimize the number of timers and therefore the amount of memory used for the cache, since we know that there can be multiple subchannels removed in the same update from the balancer, and we know that another update is likely to come in (which may remove another set of subchannels) before the timer fires for the subchannels removed in the previous update (the balancer may send updates as often as every 1s, but we cache subchannels for 10s). This way, we basically have just one timer pending at any given time, no matter how many subchannels are cached.

I could instead structure this using a separate timer for each subchannel, but that would increase the amount of memory I'd have to store for each cached subchannel: instead of just the ref to the subchannel, I'd also need to store a timer and a closure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[none of this should be blocking]

Yup ok... mostly trying to make the event engine conversion easier, since I expect the code I wrote above would be preferred there, but as expressed we'll probably end up keeping the infrastructure here and making more complicated code.

I'm not sure that conservation of timers or memory warrants the additional long term complexity.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given how expensive memory is right now, it seemed worth the optimization. But I acknowledge that I have absolutely no data to justify it; it's just sort of a hunch. It didn't seem that hard to do it this way, so I figured I might as well do it. But if at some point it is causing problems, it also isn't hard to change it to work the other way.

I don't think this will affect the EventEngine conversion either way.

}

void GrpcLb::OnSubchannelCacheTimerLocked(grpc_error_handle error) {
if (subchannel_cache_timer_pending_ && error == GRPC_ERROR_NONE) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like we need to reset subchannel_cache_timer_pending_ in here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Done.

// subchannel caching
//

void GrpcLb::CacheDeletedSubchannel(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like StartSubchannelCacheTimer and this method need to be synchronized the same way as OnSubchannelCacheTimerLocked, in order to safely access cached_subchannels_. So let's suffix these methods with Locked?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

lb_token_(std::move(lb_token)),
client_stats_(std::move(client_stats)) {}

~SubchannelWrapper() override {
if (!lb_policy_->shutting_down_) {
lb_policy_->CacheDeletedSubchannel(wrapped_subchannel());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could get rid of the timer loop in the grpclb policy if we just had this dtor allocate its own object which holds a closure, and a timer which fires in GRPC_GRPCLB_DEFAULT_SUBCHANNEL_DELETION_DELAY_MS from now (this object would destroy itself and the unref the subchannel when the timer fired).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do that, but we'd still need to keep track of all of the cached subchannels that are pending deletion, because we need to be able to cancel any pending timers when the LB policy shuts down. So I think this would increase memory usage without any real benefit, as per my reply to Craig below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to be able to cancel any pending timers when the LB policy shuts down

Just for the thought experiment, what happens if we don't try to shut down these timers when the LB policy shuts down? I wonder if the per-subchannel timer approach could be made simpler this way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would cause memory leaks on grpc shutdown.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grpc shutdown should cancel all pending timers globally, though, right? Is that not sufficient to prevent this?

The other thing I'm thinking about here is that one of more of these subchannels may be in the process of setting up a TCP connection, and TCP connection setup can't be cancelled anyways AFAIK.

Is this potential leak with the cached subchannel timers different from the case where TCP connections are still in the process of establishing and grpc shutdown is called?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the correct behavior for any code is to cancel any async work that it has pending when it shuts down. I would consider not doing that to be a bug.

I'm not sure what happens in the case where the subchannel has a pending TCP connection setup; it may be that we have a bug there. But even if we do, I don't think that's a justification for adding a new bug here.

<< "backend " << i;
}
}
// TODO(roth): This should ideally check that backend 1 never lost its
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can check that backend 1 never lost its connection by checking that it only received RPCs from one peer IP:port

EXPECT_EQ(1UL, backends_[1]->service_.clients().size()); - like above?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Done.

DEBUG_LOCATION);
}

void GrpcLb::OnSubchannelCacheTimerLocked(grpc_error_handle error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like error is missing an unref here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Fixed.

@markdroth
Copy link
Member Author

Known issues: #27711

@markdroth markdroth merged commit 2b813d2 into grpc:master Oct 13, 2021
@copybara-service copybara-service bot added the imported Specifies if the PR has been imported to the internal repository label Oct 14, 2021
dennycd pushed a commit to dennycd/grpc that referenced this pull request Oct 15, 2021
* grpclb: implement subchannel caching

* code review changes

* fix clang tidy

* code review changes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
imported Specifies if the PR has been imported to the internal repository lang/c++ lang/core release notes: no Indicates if PR should not be in release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants