New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Instabilities in sample weights in trees #4366
Comments
ping @glouppe |
The output is different because in the left branch, there is a tie for the gini score. X[5] and X[16] gives identical impurity improvement (gini = 0.48). I see two explanations:
|
Do you think this is worth looking into? It is a bit surprising from a user perspective. I realize that at some point we can't do much "because finite precision". |
Referring to the toy-example in #4347 ...
The second tree in this little ensemble is even more extreme in its differences between implementations of the Master:
Seems you might be right @glouppe , that the big differences between trees is due to evaluation of ties, or floating point almost-ties. Probably this is why the feature importances change, and I guess out of sample probas might be due to the variable make-up of those observations. |
BTW, I believe that the reported samples and values above are properly working off the same bootstrap sample here, just weighted differently due to |
closing as I think @glouppe comment is correct |
This has come up in #4347.
Changing all sample weights by a constant factor changes the output of the trees.
I thought it should not change the math, and this seems pretty substantial for floating point issues:
The text was updated successfully, but these errors were encountered: