New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Poor out-of-sample accuracy with reg:absoluteerror
#7674
Comments
Hi, thank you for posting an interesting question. I can't start investigating at the moment and will come back to your examples.
I think it's proved as an optimal value for the greedy splits. |
I tried a small random dataset, the pseudo-Huber produces really small hessian:
This limits the step size for optimization. Implementing the slope for pseudo-Huber can improve the convergence quite significantly. |
One way to handle these types of objective is to revise the tree leaf values after each iteration, which I plan to work on after this release. |
What does this mean? By the slope, do you mean the gradient? It looks like the gradient is already defined properly for this objective. |
By this way, this kind of investigation may be easier with the somewhat simpler log-cosh objective function, with gradient |
No, I meant the delta term in pseudo Huber, which is currently set to 1. |
Oh, I see. Yeah, I was thinking that the δ value should be documented in the description of |
Yeah, I have a working branch for the customizable pseudo-huber, but need some refactoring in #7640 first. For the more general case where objectives produce small hessian, I think we will try to resolve it in the next release, which is also necessary for objectives like MAE and quantile regression. |
FYI, XGBoost now lets you indicate the slope for the pseudo-Huber loss: #7727 |
Closing, since the pseudo-Huber loss now supports a custom slope ( |
I don't think that fixes the problem. Changing the slope just makes pseudo-Huber less like absolute error. |
@Kodiologist There is an experimental prototype for the L1 error: #7812. It implements an "adaptive tree" method where we can fit trees even when the Hessian is constant or not informative. So perhaps there is less need for the pseudo-Huber loss to behave like the absolute error? |
Oh, very cool, I'd hadn't heard. I'll have to look into it. |
Reopening this issue for now, since the L1 error is still experimental. |
I will work on optimal initialization later for L1. However, to fully resolve the issue for custom objective, we need to implement #7693 . I'm still not entirely how should we move forward regarding the interface. |
@trivialfis I'm getting poor fits with the new rmse = function(x, y)
sqrt(mean((x - y)^2))
mae = function(x, y)
mean(abs(x - y))
r = function(x)
round(x, 2)
set.seed(5)
N = 1000
x = rep(c(0L, 1L, 10L), len = N)
y = x^2 + rnorm(N)^2
message("RMSE original - ", r(rmse(y, x^2)))
message("MAE original - ", r(mae(y, x^2)))
for (loss in c("reg:squarederror", "reg:pseudohubererror", "reg:absoluteerror"))
{m = xgboost::xgboost(
verbose = 0,
params = list(
objective = loss,
min_child_weight = 0,
base_score = median(y)),
data = matrix(x),
label = y,
nrounds = 50)
p = predict(m, newdata = matrix(x))
message("RMSE ", loss, " - ", r(rmse(y, p)))
message("MAE ", loss, " - ", r(mae(y, p)))} The result is:
|
@Kodiologist Could you please try one of the tree methods: The adaptive tree is not implemented for Using
Here are a few things I need to do to complete the support for l1 error:
|
Thanks, that seems to work well in this case. Do you recommend
Intelligent selection of the default |
I wrote a brief introduction to various tree methods https://xgboost.readthedocs.io/en/stable/treemethod.html For constant Hessian objectives
Yes. Thank you for the suggestion. I have a POC implementation for this and will submit a PR after the work on categorical feature is completed. |
I've also noticed that my example problem has trouble with DART, even with rmse = function(x, y)
sqrt(mean((x - y)^2))
mae = function(x, y)
mean(abs(x - y))
r = function(x)
round(x, 2)
set.seed(5)
N = 1000
x = rep(c(0L, 1L, 10L), len = N)
y = x^2 + rnorm(N)^2
message("RMSE original - ", r(rmse(y, x^2)))
message("MAE original - ", r(mae(y, x^2)))
for (loss in c("reg:squarederror", "reg:pseudohubererror", "reg:absoluteerror"))
for (booster in c("gbtree", "dart"))
{set.seed(8)
m = xgboost::xgboost(
verbose = 0,
params = c(
list(
objective = loss,
min_child_weight = 0,
base_score = median(y),
tree_method = "hist",
booster = booster),
(if (booster == "dart") list(one_drop = 1))),
data = matrix(x),
label = y,
nrounds = 50)
p = predict(m, newdata = matrix(x))
message("RMSE ", loss, " ", booster, " - ", r(rmse(y, p)))
message("MAE ", loss, " ", booster, " - ", r(mae(y, p)))} The result is:
|
@trivialfis I was excited to use (If you don't have the R package
The result (minus progress bars and progress messages) is:
|
reg:absoluteerror
Hi, sorry for the slow reply. I believe this is now addressed with the support for scaling with learning rate:
|
@trivialfis I'm not reproducing your results on |
Apologies, it's still on the master branch. |
Thanks, it works for me with |
(Original title: "Poor fit with absolute-error-like objective functions")
I've looked more into the problem I described in this discussion post, and I think there are two underlying problems, one easy to solve or work around, and one not so easy.
The easy part is that
min_child_weight
(at least at its default value, 1) seems to be pretty destructive to the accuracy of some objectives, such as pseudo-Huber. In the below example (copied from my earlier post), I get a MAE for pseudo-Huber of over 33 with the defaultmin_child_weight = 1
, but 0.9 withmin_child_weight = 0
. I think the default should probably be 0. Increasingmin_child_weight
probably isn't the first thing you'd want to try if your model came out too complex, anyway.This isn't the whole story, though; because first, a log-cosh objective, defined by
still does quite badly in this case with
min_child_weight = 0
, and second, pseudo-Huber loss does better but still not good enough in the case of the trivial example:Here, if I set
min_child_weight = 0
, the predictions for pseudo-Huber become -.0125 and 1.125, which are closer to but not equal to the expected answers, 0 and 1. Perhaps tellingly, ifbase_score
is increased to 10, pseudo-Huber produces absurd predictions of [-999.9999, -728.0000], whereas the squared-error case still does the right thing.I think the problem is not with how splits are chosen in the predictor space, but how values are assigned to the leaves. In the function
CalcWeight
insrc/tree/param.h
, the leaf values come out to (in the unregularized case) the gradient divided by the Hessian. I don't really understand the motivation for this, despite consulting the paper, and it seems to work poorly in the case of pseudo-Huber, at least.The text was updated successfully, but these errors were encountered: