New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove MasterNodeRequest#TRAPPY_IMPLICIT_DEFAULT_MASTER_NODE_TIMEOUT
#107984
Labels
>bug
:Distributed/Cluster Coordination
Cluster formation and cluster state publication, including cluster membership and fault detection.
Meta
Team:Distributed
Meta label for distributed team
>tech debt
Comments
DaveCTurner
added
>bug
Meta
:Distributed/Cluster Coordination
Cluster formation and cluster state publication, including cluster membership and fault detection.
>tech debt
labels
Apr 28, 2024
Pinging @elastic/es-distributed (Team:Distributed) |
DaveCTurner
added a commit
to DaveCTurner/elasticsearch
that referenced
this issue
Apr 28, 2024
The values `30s` and `1m` are used as defaults in various places in ES, there's no need to create a new `TimeValue` instance each time they appear. Moreover we already have constants for `0` and `-1`, but we don't use these constants when reading the values off the wire. This commit adds constants for `30s` and `1m` and adjusts the deserialization code to avoid unnecessary allocation for common `TimeValue` instances. Relates elastic#107984
DaveCTurner
added a commit
to DaveCTurner/elasticsearch
that referenced
this issue
Apr 28, 2024
Many APIs accept a `?master_timeout` parameter, but reading this parameter requires a little unnecessary boilerplate to specify the literal parameter name and default value. Moreover, today's convention is to construct a `MasterNodeRequest` and then read the default master timeout from the freshly-created request. In practice this results in a default of 30s, but we specify in the docs that this default is _always_ 30s, and in principle one could create a transport request with a different initial value which would deviate from the documented behaviour. This commit introduces a utility method for reading this parameter in a fashion which is completely consistent with the documented behaviour. Relates elastic#107984
DaveCTurner
added a commit
that referenced
this issue
Apr 29, 2024
Many APIs accept a `?master_timeout` parameter, but reading this parameter requires a little unnecessary boilerplate to specify the literal parameter name and default value. Moreover, today's convention is to construct a `MasterNodeRequest` and then read the default master timeout from the freshly-created request. In practice this results in a default of 30s, but we specify in the docs that this default is _always_ 30s, and in principle one could create a transport request with a different initial value which would deviate from the documented behaviour. This commit introduces a utility method for reading this parameter in a fashion which is completely consistent with the documented behaviour. Relates #107984
DaveCTurner
added a commit
to DaveCTurner/elasticsearch
that referenced
this issue
Apr 29, 2024
There's a couple of APIs whose `?timeout` parameter also sets the `?master_timeout` timeout unless the latter is specified separately. Today this logic happens within the relevant requests' `timeout()` setters, but it'd be enormously preferable to implement this REST-layer-specific logic in the REST layer instead, and configure the timeouts in the transport-layer requests explicitly. This commit does so. Relates elastic#107984
elasticsearchmachine
pushed a commit
that referenced
this issue
Apr 29, 2024
There's a couple of APIs whose `?timeout` parameter also sets the `?master_timeout` timeout unless the latter is specified separately. Today this logic happens within the relevant requests' `timeout()` setters, but it'd be enormously preferable to implement this REST-layer-specific logic in the REST layer instead, and configure the timeouts in the transport-layer requests explicitly. This commit does so. Relates #107984
DaveCTurner
added a commit
to DaveCTurner/elasticsearch
that referenced
this issue
Apr 29, 2024
There's no good reason for this field to have `protected` visibility, and we definitely don't want subclasses to be able to set it to `null`. This commit makes it `private`. Relates elastic#107984
elasticsearchmachine
pushed a commit
that referenced
this issue
Apr 29, 2024
The values `30s` and `1m` are used as defaults in various places in ES, there's no need to create a new `TimeValue` instance each time they appear. Moreover we already have constants for `0` and `-1`, but we don't use these constants when reading the values off the wire. This commit adds constants for `30s` and `1m` and adjusts the deserialization code to avoid unnecessary allocation for common `TimeValue` instances. Relates #107984
DaveCTurner
added a commit
that referenced
this issue
Apr 30, 2024
There's no good reason for this field to have `protected` visibility, and we definitely don't want subclasses to be able to set it to `null`. This commit makes it `private`. Relates #107984
DaveCTurner
added a commit
to DaveCTurner/elasticsearch
that referenced
this issue
May 8, 2024
Removes the default constructors for `MasterRequest`, `MasterReadRequest` and `AcknowledgedRequest` in favour of constructors which require subclasses to specify the relevant timeouts. This will avoid bugs like elastic#107857 which are caused by a missing `super()` call. Also deprecates and renames the default to make it clear it should not be used in new code. Relates elastic#107984
elasticsearchmachine
pushed a commit
that referenced
this issue
May 13, 2024
Removes the default constructors for `MasterRequest`, `MasterReadRequest` and `AcknowledgedRequest` in favour of constructors which require subclasses to specify the relevant timeouts. This will avoid bugs like #107857 which are caused by a missing `super()` call. Also deprecates and renames the default to make it clear it should not be used in new code. Relates #107984
DaveCTurner
added a commit
to DaveCTurner/elasticsearch
that referenced
this issue
May 14, 2024
These overloads are no longer used so this commit removes them. Relates elastic#107984
elasticsearchmachine
pushed a commit
that referenced
this issue
May 14, 2024
These overloads are no longer used so this commit removes them. Relates #107984
DaveCTurner
added a commit
to DaveCTurner/elasticsearch
that referenced
this issue
May 14, 2024
elasticsearchmachine
pushed a commit
that referenced
this issue
May 15, 2024
DaveCTurner
added a commit
to DaveCTurner/elasticsearch
that referenced
this issue
May 16, 2024
DaveCTurner
added a commit
to DaveCTurner/elasticsearch
that referenced
this issue
May 16, 2024
DaveCTurner
added a commit
to DaveCTurner/elasticsearch
that referenced
this issue
May 16, 2024
DaveCTurner
added a commit
that referenced
this issue
May 16, 2024
DaveCTurner
added a commit
that referenced
this issue
May 16, 2024
nicktindall
pushed a commit
to nicktindall/elasticsearch
that referenced
this issue
May 17, 2024
nicktindall
pushed a commit
to nicktindall/elasticsearch
that referenced
this issue
May 17, 2024
DaveCTurner
changed the title
Remove trappy implicit default
Remove May 17, 2024
MasterNodeRequest#masterNodeTimeout
MasterNodeRequest#TRAPPY_IMPLICIT_DEFAULT_MASTER_NODE_TIMEOUT
DaveCTurner
added a commit
to DaveCTurner/elasticsearch
that referenced
this issue
May 17, 2024
parkertimmins
pushed a commit
to parkertimmins/elasticsearch
that referenced
this issue
May 17, 2024
parkertimmins
pushed a commit
to parkertimmins/elasticsearch
that referenced
this issue
May 17, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
>bug
:Distributed/Cluster Coordination
Cluster formation and cluster state publication, including cluster membership and fault detection.
Meta
Team:Distributed
Meta label for distributed team
>tech debt
Today we quietly set a default of 30s for
MasterNodeRequest#masterNodeTimeout
, relying on callers to override it where appropriate. However it is essentially always appropriate to override this timeout in production code. Forgetting to do so leads to subtle but critical bugs that only arise when a cluster is struggling to process cluster state updates as fast as normal, in which there is no way to lengthen these timeouts at runtime to bring the cluster back to health. There are several such bugs in Elasticsearch today (e.g. #107857, but I've found others too).Instead in production code we should move away from having any quiet default for this timeout and instead require callers to specify an appropriate value when creating the request instance in the first place. For example:
?master_timeout
request parameter, whose default ofRestUtils#REST_MASTER_TIMEOUT_DEFAULT
(30s) is explicitly documented and conceptually separate fromMasterNodeRequest#DEFAULT_MASTER_NODE_TIMEOUT
(also coincidentally 30s).It's not a trivial change since there are currently over 200 implementations of
MasterNodeRequest
, all of which currently rely on the default behaviour, so it's going to take a few steps to complete this work. Therefore I'm opening this meta-issue to track the work needed in this area.The text was updated successfully, but these errors were encountered: