-
Notifications
You must be signed in to change notification settings - Fork 8.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reshard by redistributing samples to new queues #13769
base: main
Are you sure you want to change the base?
Conversation
2c33f6b
to
ed201fe
Compare
ed201fe
to
f542c82
Compare
Thanks, I'll take a look at this. |
(Hello from bug scrub meeting). Friendly ping @machine424 if you want to review, but it feels that:
|
+ @csmarchbanks @cstyan, you might want to review this too as remote maintainers |
@darshanime this is on my review list for this week 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a good change, but we should do some testing. As Bartek mentioned, we could run into unintended consequences since general throughput is a hot path plus resharding has the potential to cause problems with how long it takes.
Can we hedge the redistribution? Like, only redistribute if we have more than # of shards * batch size
samples buffered, or if the flushing takes longer than X seconds (the flush deadline perhaps?)
storage/remote/queue_manager.go
Outdated
func (s *shards) reshard(numShards int) bool { | ||
// Exclusive lock to ensure that this does not run concurrently with enqueue. | ||
s.mtx.Lock() | ||
defer s.mtx.Unlock() | ||
|
||
newQueues := make([]*queue, numShards) | ||
for i := 0; i < numShards; i++ { | ||
newQueues[i] = newQueue(s.qm.cfg.MaxSamplesPerSend, s.qm.cfg.Capacity) | ||
} | ||
|
||
for _, queue := range s.queues { | ||
queue.batchMtx.Lock() | ||
|
||
for _, ts := range queue.batch { | ||
queueIndex := uint64(ts.ref) % uint64(len(newQueues)) | ||
added := newQueues[queueIndex].Append(ts) | ||
if !added { | ||
// We are not able to add, we can revert to the start/stop loop. | ||
queue.batchMtx.Unlock() | ||
return false | ||
} | ||
} | ||
} | ||
|
||
// We have successfully moved all the samples, now can delete the old queues. | ||
for _, queue := range s.queues { | ||
close(queue.batchQueue) | ||
queue.batchMtx.Unlock() | ||
} | ||
|
||
// Waiting till flushDeadline for all the runShards to terminate. | ||
select { | ||
case <-s.done: | ||
case <-time.After(s.qm.flushDeadline): | ||
// Cancelling the current context so as to unblock client calls. | ||
s.hardShutdown() | ||
<-s.done | ||
} | ||
|
||
s.queues = newQueues | ||
var hardShutdownCtx context.Context | ||
hardShutdownCtx, s.hardShutdown = context.WithCancel(context.Background()) | ||
s.running.Store(int32(numShards)) | ||
s.done = make(chan struct{}) | ||
for i := 0; i < numShards; i++ { | ||
go s.runShard(hardShutdownCtx, i, newQueues[i]) | ||
} | ||
|
||
return true | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function feels like it's doing a little bit too much IMO. Is there any refactoring we can do so we can reuse stop
and start
What do you think of this; rename this function to redistributeSamples
, have it return the same bool of whether it was successful or not, and then also update shards.stop
to take a bool force
. In stop
we would skip the graceful shutdown attempt and go straight to the unclean shutdown if force
is true. Then, we can just check successful
in the reshardLoop
, and call stop
appropriately, but also always call start
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing this ☝️ might mean we need to rearrange where the locks are acquired.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@darshanime we had a chance to test this out in a dev environment yesterday, no obvious issues we were still able to achieve the same throughput and it looked like we were resharding within a slightly smaller spread of shard replicas which is what I would expect to see.
However, one thing we did notice, the metric for the # of active shards is not being set properly anymore. We could set it here in this function, but I think it's probably better to set it on line 1135/1136 after we get the successful
return value from calling reshard
.
bump @darshanime |
Signed-off-by: darshanime <deathbullet@gmail.com>
Signed-off-by: darshanime <deathbullet@gmail.com>
f542c82
to
6e83e77
Compare
Thanks for the review, @cstyan
Currently, we have 2 ways of resharding; triggering soft shutdown, and waiting for all the queues to drain (prone to tail latency amplification), and, if that fails, triggering hard shutdown and dropping samples. This PR adds a 3rd way; redistributing the samples to new shards. The original issue we're trying to address is reducing the impact of resharding; both in terms of time it takes to complete the resharding and dropped samples. With that in mind, imo we should attempt the redistribution first (since it takes "no latency", and causes no dropped samples). We are falling back to the other 2 ways if this fails. We can hedge the redistribution wrt
👍 I've broken the
I think this may lead to higher dropped samples if the redistribution fails? Note, when we attempt redistribution and it fails, we haven't spent any time yet, so retaining the original soft -> hard shutdown won't change the current latency characteristics. Can add skipping soft shutdown behind a flag, wdyt?
Thanks, fixed. I've added it to the |
True. The fact that both
This seems like a reasonable compromise. Let me think about it some more. |
for i := 0; i < numShards; i++ { | ||
newQueues[i] = newQueue(s.qm.cfg.MaxSamplesPerSend, s.qm.cfg.Capacity) | ||
} | ||
|
||
successful := s.redistributeSamples(newQueues) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I'm thinking about, with the way things are written currently if we are downsharding we could possible often get successful = false
when attempting to redistribute.
If we're downsharding and all the existing queues are at full Capacity
, the newer lower amount of shards won't be able to buffer that many samples. In which case, we either want to know that ahead of time and not attempt to redistribute or have a way to ensure queues that are having samples redistributed to them can send when they're full.
Additionally, we're potentially doubling the amount of memory used by enqueued samples for the duration of the resharding operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we not also missing a call to runShard
? We're just creating new queues but passing them to the existing shards?
Nevermind, found it.
for i := 0; i < numShards; i++ { | ||
go s.runShard(hardShutdownCtx, i, newQueues[i]) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah potentially we want to start these immediately after creating the queues, otherwise we might fail to redistribute when downsharding
During resharding, try to redistribute the samples amongst the new queues instead of waiting for them all to be sent out, which is prone to tail latency.
closes #7230