New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ADDED] Exponential backoff for server reconnects #1224
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -50,8 +50,7 @@ const ( | |
Version = "1.24.0" | ||
DefaultURL = "nats://127.0.0.1:4222" | ||
DefaultPort = 4222 | ||
DefaultMaxReconnect = 60 | ||
DefaultReconnectWait = 2 * time.Second | ||
DefaultMaxReconnect = -1 | ||
DefaultReconnectJitter = 100 * time.Millisecond | ||
DefaultReconnectJitterTLS = time.Second | ||
DefaultTimeout = 2 * time.Second | ||
|
@@ -62,6 +61,10 @@ const ( | |
RequestChanLen = 8 | ||
DefaultDrainTimeout = 30 * time.Second | ||
LangString = "go" | ||
|
||
// DEPRECATED: Client now uses [nats.DefaultReconnectBackoffHandler] to | ||
// handle default reconnect wait time. | ||
DefaultReconnectWait = 2 * time.Second | ||
) | ||
|
||
const ( | ||
|
@@ -143,17 +146,17 @@ var ( | |
// GetDefaultOptions returns default configuration options for the client. | ||
func GetDefaultOptions() Options { | ||
return Options{ | ||
AllowReconnect: true, | ||
MaxReconnect: DefaultMaxReconnect, | ||
ReconnectWait: DefaultReconnectWait, | ||
ReconnectJitter: DefaultReconnectJitter, | ||
ReconnectJitterTLS: DefaultReconnectJitterTLS, | ||
Timeout: DefaultTimeout, | ||
PingInterval: DefaultPingInterval, | ||
MaxPingsOut: DefaultMaxPingOut, | ||
SubChanLen: DefaultMaxChanLen, | ||
ReconnectBufSize: DefaultReconnectBufSize, | ||
DrainTimeout: DefaultDrainTimeout, | ||
AllowReconnect: true, | ||
MaxReconnect: DefaultMaxReconnect, | ||
ReconnectJitter: DefaultReconnectJitter, | ||
ReconnectJitterTLS: DefaultReconnectJitterTLS, | ||
Timeout: DefaultTimeout, | ||
PingInterval: DefaultPingInterval, | ||
MaxPingsOut: DefaultMaxPingOut, | ||
SubChanLen: DefaultMaxChanLen, | ||
ReconnectBufSize: DefaultReconnectBufSize, | ||
DrainTimeout: DefaultDrainTimeout, | ||
IgnoreAuthErrorAbort: true, | ||
} | ||
} | ||
|
||
|
@@ -470,6 +473,7 @@ type Options struct { | |
|
||
// IgnoreAuthErrorAbort - if set to true, client opts out of the default connect behavior of aborting | ||
// subsequent reconnect attempts if server returns the same auth error twice (regardless of reconnect policy). | ||
// DEPRECATED: This option will be removed in future releases. | ||
IgnoreAuthErrorAbort bool | ||
|
||
// SkipHostLookup skips the DNS lookup for the server hostname. | ||
|
@@ -1260,13 +1264,22 @@ func CustomInboxPrefix(p string) Option { | |
|
||
// IgnoreAuthErrorAbort opts out of the default connect behavior of aborting | ||
// subsequent reconnect attempts if server returns the same auth error twice. | ||
// DEPRECATED: This option is now set to 'true' by default, therefore this option will be removed in future releases. | ||
func IgnoreAuthErrorAbort() Option { | ||
return func(o *Options) error { | ||
o.IgnoreAuthErrorAbort = true | ||
return nil | ||
} | ||
} | ||
|
||
// AbortOnAuthErrors causes the client to bail out after 2 subsequent auth connection errors. | ||
func AbortOnAuthErrors() Option { | ||
return func(o *Options) error { | ||
o.IgnoreAuthErrorAbort = false | ||
return nil | ||
} | ||
} | ||
|
||
// SkipHostLookup is an Option to skip the host lookup when connecting to a server. | ||
func SkipHostLookup() Option { | ||
return func(o *Options) error { | ||
|
@@ -2559,6 +2572,28 @@ func (nc *Conn) stopPingTimer() { | |
} | ||
} | ||
|
||
// DefaultReconnectBackoffHandler returns a default reconnect exponential backoff interval. | ||
// Base reconnect wait is 10ms, with x2 multiplier. Max wait time is 2 minutes. | ||
// 10ms, 20ms, 40ms, 80ms...2m | ||
// A random jitter is added to the result. | ||
func DefaultReconnectBackoffHandler(jitter time.Duration) ReconnectDelayHandler { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. With this feature, I am wondering if some of the other options we have on the clients should be deprecated or go away (at the very least, this should be the default) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is default with this PR (I removed having |
||
return func(attempts int) time.Duration { | ||
// base interval is 10ms | ||
backoff := 10 * time.Millisecond | ||
for i := 0; i < attempts-1; i++ { | ||
backoff *= 2 | ||
if backoff > 2*time.Minute { | ||
backoff = 2 * time.Minute | ||
break | ||
} | ||
} | ||
if jitter > 0 { | ||
jitter = time.Duration(rand.Int63n(int64(jitter))) | ||
} | ||
return backoff + jitter | ||
} | ||
} | ||
|
||
// Try to reconnect using the option parameters. | ||
// This function assumes we are allowed to reconnect. | ||
func (nc *Conn) doReconnect(err error) { | ||
|
@@ -2596,18 +2631,19 @@ func (nc *Conn) doReconnect(err error) { | |
var wlf int | ||
|
||
var jitter time.Duration | ||
var rw time.Duration | ||
// If a custom reconnect delay handler is set, this takes precedence. | ||
crd := nc.Opts.CustomReconnectDelayCB | ||
if crd == nil { | ||
rw = nc.Opts.ReconnectWait | ||
rw := nc.Opts.ReconnectWait | ||
if crd == nil && rw == 0 { | ||
// TODO: since we sleep only after the whole list has been tried, we can't | ||
// rely on individual *srv to know if it is a TLS or non-TLS url. | ||
// We have to pick which type of jitter to use, for now, we use these hints: | ||
jitter = nc.Opts.ReconnectJitter | ||
if nc.Opts.Secure || nc.Opts.TLSConfig != nil { | ||
jitter = nc.Opts.ReconnectJitterTLS | ||
} | ||
|
||
crd = DefaultReconnectBackoffHandler(jitter) | ||
} | ||
|
||
for i := 0; len(nc.srvPool) > 0; { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense to change into a span of time but not sure about changing this var as the default since has been like that from the start... maybe as an option to test out new behavior like
MaxTotalReconnectTime(2*time.Minute)
? We also need to be able to setmax reconnect
to -1 so that it means that we want the client to reconnect forever and never give up reconnectingThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so on the websocket client under firefox we get huge issue, because aside from the fact that they don't honor socket requests to cancel etc, if the client doesn't somewhat line up, all sorts of crazy stuff happens - Firefox is the only one to implement that, but at any rate here's the description
https://www.rfc-editor.org/rfc/rfc6455#section-7.2.3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@piotrpio's change is right in line with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am OK with values as we have them, with possibly adding a new option "ExponentialBackoff" which would be or the start backoff - understanding that it doubles every try and then at 10 increases, it resets - The trick is that at 2s start, that means 17m interval on the 10th try which is possibly way too long. Possibly it is right to say the first retry is like 25ms or so... - That would set the upper bound near 1m if I have my maths right.