Skip to content

Commit

Permalink
[ML] Set ml.config_version attribute even when ML is disabled
Browse files Browse the repository at this point in the history
Even if ML is disabled on a node we need to set the ML config
version attribute, as an old node could still affect what ML
functionality is safe to use.

Additionally, in the code that was supposed to avoid a crash
when the attribute was missing we were catching the wrong type
of exception.
  • Loading branch information
droberts195 committed Feb 2, 2024
1 parent e87f58d commit 659493a
Show file tree
Hide file tree
Showing 2 changed files with 13 additions and 8 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -289,7 +289,7 @@ public static Tuple<MlConfigVersion, MlConfigVersion> getMinMaxMlConfigVersion(D
if (mlConfigVersion.after(maxMlConfigVersion)) {
maxMlConfigVersion = mlConfigVersion;
}
} catch (IllegalArgumentException e) {
} catch (IllegalStateException e) {
// This means we encountered a node that is after 8.10.0 but has the ML plugin disabled - ignore it
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -829,13 +829,19 @@ public Settings additionalSettings() {
String allocatedProcessorsAttrName = "node.attr." + ALLOCATED_PROCESSORS_NODE_ATTR;
String mlConfigVersionAttrName = "node.attr." + ML_CONFIG_VERSION_NODE_ATTR;

if (enabled == false) {
disallowMlNodeAttributes(maxOpenJobsPerNodeNodeAttrName, machineMemoryAttrName, jvmSizeAttrName, mlConfigVersionAttrName);
return Settings.EMPTY;
}

Settings.Builder additionalSettings = Settings.builder();
if (DiscoveryNode.hasRole(settings, DiscoveryNodeRole.ML_ROLE)) {

// The ML config version is needed for two related reasons even if ML is currently disabled on the node:
// 1. If ML is in use then decisions about minimum node versions need to include this node, and not
// having it available can cause exceptions during cluster state processing
// 2. It could be argued that reason 1 could be fixed by completely ignoring the node, however,
// then there would be a risk that ML is later enabled on an old node that was ignored, and
// some new ML feature that's been used is then incompatible with it
// The only safe approach is to consider which ML code _all_ nodes in the cluster are running, regardless
// of whether they currently have ML enabled.
addMlNodeAttribute(additionalSettings, mlConfigVersionAttrName, MlConfigVersion.CURRENT.toString());

if (enabled && DiscoveryNode.hasRole(settings, DiscoveryNodeRole.ML_ROLE)) {
addMlNodeAttribute(
additionalSettings,
machineMemoryAttrName,
Expand All @@ -859,7 +865,6 @@ public Settings additionalSettings() {
allocatedProcessorsAttrName
);
}
addMlNodeAttribute(additionalSettings, mlConfigVersionAttrName, MlConfigVersion.CURRENT.toString());
return additionalSettings.build();
}

Expand Down

0 comments on commit 659493a

Please sign in to comment.