Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sidecar Flag: Random delay before upload blocks on sidecar #7239

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

Vanshikav123
Copy link
Contributor

@Vanshikav123 Vanshikav123 commented Mar 26, 2024

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

#7184
Implemented the logic to sleep for a random time between 0 and the specified jitter before uploading blocks.Defined a flag to specify the maximum random delay before uploading blocks.

Verification

Signed-off-by: Vanshikav123 <vanshikav928@gmail.com>
@Vanshikav123
Copy link
Contributor Author

Hello ! @yeya24 PTAL

@@ -506,7 +510,30 @@ type sidecarConfig struct {
storeRateLimits store.SeriesSelectLimits
}

type durationValue time.Duration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to set duration via flag directly without implementing interface.

https://github.com/thanos-io/thanos/blob/main/cmd/thanos/receive.go#L968

@@ -363,6 +364,9 @@ func runSidecar(
uploadCompactedFunc, conf.shipper.allowOutOfOrderUpload, metadata.HashFunc(conf.shipper.hashFunc), conf.shipper.metaFileName)

return runutil.Repeat(30*time.Second, ctx.Done(), func() error {
// Generate random delay using upload jitter.
jitter := time.Duration(rand.Int63n(int64(conf.shipper.uploadJitter)))
time.Sleep(jitter)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of sleeping, can we do something similar to this https://github.com/cortexproject/cortex/blob/master/pkg/util/time.go#L73?

I image we will add a new function called runutil.RepeatWithJitter which takes the same parameters as runutil.Repeat but also takes variancePerc.
We can just hardcode variancePerc to a sane value like 0.05 so allows [-1.5s, 1.5s] jitter.

Copy link
Contributor

@fpetkovski fpetkovski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this contribution. I am curious what is the benefit to adding jitter to the upload.

@@ -105,6 +106,19 @@ func registerSidecar(app *extkingpin.App) {
})
}

// DurationWithJitter returns random duration from "input - input*variancePerc" to "input + input*variancePerc" interval.
func DurationWithJitter(input time.Duration, variancePerc float64) time.Duration {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

@@ -518,4 +538,7 @@ func (sc *sidecarConfig) registerFlag(cmd extkingpin.FlagClause) {
sc.storeRateLimits.RegisterFlags(cmd)
cmd.Flag("min-time", "Start of time range limit to serve. Thanos sidecar will serve only metrics, which happened later than this value. Option can be a constant time in RFC3339 format or time duration relative to current time, such as -1d or 2h45m. Valid duration units are ms, s, m, h, d, w, y.").
Default("0000-01-01T00:00:00Z").SetValue(&sc.limitMinTime)
conf := &sidecarConfig{}
cmd.Flag("upload-jitter", "Maximum random delay before uploading blocks.").Default("0s").DurationVar(&uploadJitter)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we bind the flag directly to conf.shipper.uploadJitter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yess i will do it

case <-ctx.Done():
return nil
default:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

@Vanshikav123
Copy link
Contributor Author

Thanks for this contribution. I am curious what is the benefit to adding jitter to the upload.

Thanks for this contribution. I am curious what is the benefit to adding jitter to the upload.

adding jitter prevents network blockage as it introduces random delays to the uploads

Signed-off-by: Vanshikav123 <vanshikav928@gmail.com>
@Vanshikav123
Copy link
Contributor Author

Hello! @fpetkovski @yeya24 resolved the reviews PTAL!

jitter := DurationWithJitter(conf.shipper.uploadJitter, 0.05)
if jitter > 0 {
time.Sleep(jitter)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need to add another jitter here? Isn't RepeatWithJitter enough

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should i also remove DurationWithJitter function or keep it for future use ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove that

Vanshikav123 and others added 2 commits April 5, 2024 22:30
@@ -362,7 +375,7 @@ func runSidecar(
s := shipper.New(logger, reg, conf.tsdb.path, bkt, m.Labels, metadata.SidecarSource,
uploadCompactedFunc, conf.shipper.allowOutOfOrderUpload, metadata.HashFunc(conf.shipper.hashFunc), conf.shipper.metaFileName)

return runutil.Repeat(30*time.Second, ctx.Done(), func() error {
return runutil.RepeatWithJitter(30*time.Second, ctx, 0.05, func() error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0.05 variance seems too small as it is only 30s * 0.05 = 1.5s?
I am thinking about 0.2 here maybe

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved all !

@@ -518,4 +531,6 @@ func (sc *sidecarConfig) registerFlag(cmd extkingpin.FlagClause) {
sc.storeRateLimits.RegisterFlags(cmd)
cmd.Flag("min-time", "Start of time range limit to serve. Thanos sidecar will serve only metrics, which happened later than this value. Option can be a constant time in RFC3339 format or time duration relative to current time, such as -1d or 2h45m. Valid duration units are ms, s, m, h, d, w, y.").
Default("0000-01-01T00:00:00Z").SetValue(&sc.limitMinTime)
conf := &sidecarConfig{}
cmd.Flag("upload-jitter", "Maximum random delay before uploading blocks.").Default("0s").DurationVar(&conf.shipper.uploadJitter)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this if it is not used anymore

@@ -164,6 +164,7 @@ type shipperConfig struct {
allowOutOfOrderUpload bool
hashFunc string
metaFileName string
uploadJitter time.Duration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this if not needed

Signed-off-by: Vanshikav123 <vanshikav928@gmail.com>
@@ -228,6 +228,7 @@ Flags:
configuration. See format details:
https://thanos.io/tip/thanos/tracing.md/#configuration
--tsdb.path="./data" Data directory of TSDB.
--upload-jitter=0s Maximum random delay before uploading blocks.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to fix docs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also it looks like this flag is never consumed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also added nowhere

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yess will remove it

@@ -362,7 +362,7 @@ func runSidecar(
s := shipper.New(logger, reg, conf.tsdb.path, bkt, m.Labels, metadata.SidecarSource,
uploadCompactedFunc, conf.shipper.allowOutOfOrderUpload, metadata.HashFunc(conf.shipper.hashFunc), conf.shipper.metaFileName)

return runutil.Repeat(30*time.Second, ctx.Done(), func() error {
return runutil.RepeatWithJitter(30*time.Second, ctx, 0.2, func() error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this really help though if we are jittering only 6s; at least for us block uploads can take far longer

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can increase the jitter. What value do you think that makes sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, the whole method feels kinda weird. It was about in house blob storage and blocks being uploaded at same time right?

Signed-off-by: Vanshikav123 <vanshikav928@gmail.com>
// RepeatWithJitter executes f with a random jitter added to the interval between each execution.
// It continues until ctx is done or f returns an error.
// The jitter factor should be between 0 and 1, where 0 means no jitter and 1 means the interval can vary from 0 to 2 times the original interval.
func RepeatWithJitter(interval time.Duration, ctx context.Context, jitterFactor float64, f func() error) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ctx should be always first parameter. https://pkg.go.dev/context#pkg-overview

Vanshikav123 and others added 2 commits April 8, 2024 16:18
Signed-off-by: Vanshikav123 <vanshikav928@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants