You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There seems to be a discrepancy between the XGBoost documentation on Feature Interaction Constraints and its actual behavior in practice. The documentation suggests that using constraints like [[0, 1], [1, 3, 4]] would allow a feature (like 0) to interact with features 3 or 4 via an intermediary feature (like 1). However, empirical tests suggest this interaction does not occur as documented.
Steps to Reproduce
Create a dataset with three standardized normal variables.
Define the target variable y as the product of these variables.
Train an XGBoost model with the feature interaction constraint [[0, 1], [1, 2]].
Observe the resulting trees and model performance.
Expected Behavior: Based on the documentation, one would expect the model to learn interactions involving all three variables (0, 1, 2) in at least some trees.
Actual Behavior: The model does not seem to learn the expected product relationship, and none of the trees feature all three variables simultaneously.
If the observed behavior is correct, could you please update the documentation to reflect the actual constraint mechanism?
In my opinion, as a data scientist, having an interaction constraint that limits interactions with immediate neighbors in a decision tree, as explained in the documentation, is not truly beneficial. Instead, a more helpful approach would be to prohibit the simultaneous presence of nodes that are not set to interact throughout the entire tree, as seems to occur in practice, or at least within a branch originating from the root node.
The text was updated successfully, but these errors were encountered:
There seems to be a discrepancy between the XGBoost documentation on Feature Interaction Constraints and its actual behavior in practice. The documentation suggests that using constraints like [[0, 1], [1, 3, 4]] would allow a feature (like 0) to interact with features 3 or 4 via an intermediary feature (like 1). However, empirical tests suggest this interaction does not occur as documented.
Steps to Reproduce
Expected Behavior: Based on the documentation, one would expect the model to learn interactions involving all three variables (0, 1, 2) in at least some trees.
Actual Behavior: The model does not seem to learn the expected product relationship, and none of the trees feature all three variables simultaneously.
If the observed behavior is correct, could you please update the documentation to reflect the actual constraint mechanism?
In my opinion, as a data scientist, having an interaction constraint that limits interactions with immediate neighbors in a decision tree, as explained in the documentation, is not truly beneficial. Instead, a more helpful approach would be to prohibit the simultaneous presence of nodes that are not set to interact throughout the entire tree, as seems to occur in practice, or at least within a branch originating from the root node.
The text was updated successfully, but these errors were encountered: