[ML] Set ml.config_version attribute even when ML is disabled

Even if ML is disabled on a node we need to set the ML config version attribute, as an old node could still affect what ML functionality is safe to use. Additionally, in the code that was supposed to avoid a crash when the attribute was missing we were catching the wrong type of exception.
droberts195 · Feb 2, 2024 · 659493a · 659493a
1 parent e87f58d
commit 659493a
Show file tree

Hide file tree

Showing 2 changed files with 13 additions and 8 deletions.
diff --git a/x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/MlConfigVersion.java b/x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/MlConfigVersion.java
@@ -289,7 +289,7 @@ public static Tuple<MlConfigVersion, MlConfigVersion> getMinMaxMlConfigVersion(D
                 if (mlConfigVersion.after(maxMlConfigVersion)) {
                     maxMlConfigVersion = mlConfigVersion;
                 }
-            } catch (IllegalArgumentException e) {
+            } catch (IllegalStateException e) {
                 // This means we encountered a node that is after 8.10.0 but has the ML plugin disabled - ignore it
             }
         }

diff --git a/x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/MachineLearning.java b/x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/MachineLearning.java
@@ -829,13 +829,19 @@ public Settings additionalSettings() {
         String allocatedProcessorsAttrName = "node.attr." + ALLOCATED_PROCESSORS_NODE_ATTR;
         String mlConfigVersionAttrName = "node.attr." + ML_CONFIG_VERSION_NODE_ATTR;
 
-        if (enabled == false) {
-            disallowMlNodeAttributes(maxOpenJobsPerNodeNodeAttrName, machineMemoryAttrName, jvmSizeAttrName, mlConfigVersionAttrName);
-            return Settings.EMPTY;
-        }
-
         Settings.Builder additionalSettings = Settings.builder();
-        if (DiscoveryNode.hasRole(settings, DiscoveryNodeRole.ML_ROLE)) {
+
+        // The ML config version is needed for two related reasons even if ML is currently disabled on the node:
+        // 1. If ML is in use then decisions about minimum node versions need to include this node, and not
+        // having it available can cause exceptions during cluster state processing
+        // 2. It could be argued that reason 1 could be fixed by completely ignoring the node, however,
+        // then there would be a risk that ML is later enabled on an old node that was ignored, and
+        // some new ML feature that's been used is then incompatible with it
+        // The only safe approach is to consider which ML code _all_ nodes in the cluster are running, regardless
+        // of whether they currently have ML enabled.
+        addMlNodeAttribute(additionalSettings, mlConfigVersionAttrName, MlConfigVersion.CURRENT.toString());
+
+        if (enabled && DiscoveryNode.hasRole(settings, DiscoveryNodeRole.ML_ROLE)) {
             addMlNodeAttribute(
                 additionalSettings,
                 machineMemoryAttrName,
@@ -859,7 +865,6 @@ public Settings additionalSettings() {
                 allocatedProcessorsAttrName
             );
         }
-        addMlNodeAttribute(additionalSettings, mlConfigVersionAttrName, MlConfigVersion.CURRENT.toString());
         return additionalSettings.build();
     }