New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docs for custom objectives could cover more details #10105
Comments
No, it doesn't. We kind of workaround it by clipping it, but there's no proof that it can converge. So, yes, XGBoost can run, whether the result is useful is a different problem.
Yes, you can use the expected Hessian. Again, I don't have a proof for it to work, but experiments with probabilistic forecasting seem promising.
As long as you can come up with an approximation of the Hessian that's 0 or 1 dim per-sample (<= 2-dim for the whole dataset), it should be supported. For instance, the softmax cross entropy uses the diagonal of the Hessian. |
I can work on these things later. Thank you for raising the issue. |
But is that the best approach for multinomial logistic? Would such a procedure be guaranteed to converge for general optimization? Take for example GLMNET: in its multinomial variant in which it solves for one class at a time, it recalculates the gradients and hessians after every single-class newton iteration, instead of after a full cycle over all classes like XGBoost does: And in its equivalent of multiple classes solved at the same time, it uses a diagonal upper bound of the hessian instead of the true diagonal: Did some experiments and the diagonal Hessian seems to indeed perform substantially better than their upper bound, but don't know if it has the same theoretical properties around convergence. In any event, would still be ideal to mention in the docs that multi-valued response objectives need a diagonal approximation of the hessian, and that with multi-trees approach the gradients and hessians are not recalculated after every single tree update (maybe there could be an option to make them so?). |
I haven't looked into the referenced paper yet. Actually, XGB is using the diagonal multiplied by a constant 2. There's a proof based on conditional random field that this is the upper bound, then the upper bound is transferred to logistic function by removing the edge potential from CRF. As of regression, we simply calculate the point gradient. |
Actually no need to look at the paper. After reading it in more detail, I now notice: they use a different upper bound for multinomial because their least squares algorithm only supports having the same weight/hessian per observation for each class, while XGBoost doesn't have such limitation. And on a deeper look, I actually notice that they do derive the exact same multinomial upper bound as "twice the diagonal of the Hessian" for the more general case, which is the same that xgboost is using. I guess in the general case for arbitrary objectives, the recommended procedure for hessians that have dependencies between targets could be to calculate a diagonal bound by summing the absolute values row-wise for each observation in the full Hessian of dimensions |
I'm not sure about the intuition behind the suggestion about the summation yet. Will give it more thoughts. Could you please elaborate? |
It's straight from the definition of a diagonally dominant matrix, and is what the GLMNET paper referenced in their proof. The idea is that such a diagonal matrix for arbitraty is guaranteed to hold. So in the case of multinomial logistic, for a particular observation since |
Thank you for the detailed explanation! |
Would also be ideal to mention in the example for custom multi-target objectives that the multinomial example it is giving right there where it says "hessian" is not the true hessian, but a diagonal upper bound on it, which otherwise can be quite confusing. |
From reading the docs, it's quite unclear what exactly is or isn't supported as custom objectives in XGBoost.
For example, one might wonder after reading the docs:
The text was updated successfully, but these errors were encountered: