Skip to content

Commit

Permalink
DOC minor improvements of Features in Histogram Gradient Boosting Tre…
Browse files Browse the repository at this point in the history
…es (#28858)
  • Loading branch information
lorentzenchr committed Apr 19, 2024
1 parent 0f701ff commit 3af257e
Showing 1 changed file with 26 additions and 22 deletions.
48 changes: 26 additions & 22 deletions examples/ensemble/plot_hgbt_regression.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,10 @@
The top usability features of HGBT models are:
1. Several available loss function for mean and quantile regression tasks, see
1. Several available loss functions for mean and quantile regression tasks, see
:ref:`Quantile loss <quantile_support_hgbdt>`.
2. :ref:`categorical_support_gbdt` (see
:ref:`sphx_glr_auto_examples_ensemble_plot_gradient_boosting_categorical.py`).
2. :ref:`categorical_support_gbdt`, see
:ref:`sphx_glr_auto_examples_ensemble_plot_gradient_boosting_categorical.py`.
3. Early stopping.
4. :ref:`nan_support_hgbt`, which avoids the need for an imputer.
5. :ref:`monotonic_cst_gbdt`.
Expand All @@ -38,7 +38,7 @@
# set every five minutes. Electricity transfers to/from the neighboring state of
# Victoria were done to alleviate fluctuations.
#
# The dataset (originally named ELEC2) contains 45,312 instances dated from 7
# The dataset, originally named ELEC2, contains 45,312 instances dated from 7
# May 1996 to 5 December 1998. Each sample of the dataset refers to a period of
# 30 minutes, i.e. there are 48 instances for each time period of one day. Each
# sample on the dataset has 7 columns:
Expand All @@ -48,7 +48,7 @@
# - nswprice/nswdemand: electricity price/demand of New South Wales;
# - vicprice/vicdemand: electricity price/demand of Victoria.
#
# It is originally a classification task, but here we use it for the regression
# Originally, it is a classification task, but here we use it for the regression
# task to predict the scheduled electricity transfer between states.

from sklearn.datasets import fetch_openml
Expand Down Expand Up @@ -86,7 +86,7 @@
_ = ax.legend(handles, ["Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat"])

# %%
# Notice energy transfer increases systematically during weekends.
# Notice that energy transfer increases systematically during weekends.
#
# Effect of number of trees and early stopping
# ============================================
Expand All @@ -95,7 +95,7 @@
# daily electricity transfer using the whole dataset. Then we visualize its
# predictions depending on the `max_iter` parameter. Here we don't try to
# evaluate the performance of the model and its capacity to generalize but
# rather its capacity to learn from the training data.
# rather its capability to learn from the training data.

from sklearn.ensemble import HistGradientBoostingRegressor
from sklearn.model_selection import train_test_split
Expand Down Expand Up @@ -144,22 +144,24 @@
# With just a few iterations, HGBT models can achieve convergence (see
# :ref:`sphx_glr_auto_examples_ensemble_plot_forest_hist_grad_boosting_comparison.py`),
# meaning that adding more trees does not improve the model anymore. In the
# figure above, 5 iterations are not enough to be able to predict. With 50
# figure above, 5 iterations are not enough to get good predictions. With 50
# iterations, we are already able to do a good job.
#
# Instead of relying on `max_iter` alone to determine when to stop, the HGBT
# implementation in scikit-learn supports early stopping. With it, the model
# Setting `max_iter` too high might degrade the prediction quality and cost a lot of
# avoidable computing resources. Therefore, the HGBT implementation in scikit-learn
# provides an automatic **early stopping** strategy. With it, the model
# uses a fraction of the training data as internal validation set
# (`validation_fraction`) and stops training if the validation score does not
# improve (or degrades) after `n_iter_no_change` iterations up to a certain
# `tol`.
# tolerance (`tol`).
#
# Notice that there is a trade-off between `learning_rate` and `max_iter`:
# Generally, smaller learning rates are preferable but require more iterations
# to converge to the minimum loss, while larger learning rates converge faster
# (less iterations/trees needed) but at the cost of a larger minimum loss.
#
# Indeed, a good practice is to tune the learning rate along with any other
# Because of this high correlation between the learning rate the number of iterations,
# a good practice is to tune the learning rate along with all (important) other
# hyperparameters, fit the HBGT on the training set with a large enough value
# for `max_iter` and determine the best `max_iter` via early stopping and some
# explicit `validation_fraction`.
Expand Down Expand Up @@ -204,9 +206,9 @@
# HGBT models have native support of missing values. During training, the tree
# grower decides where samples with missing values should go (left or right
# child) at each split, based on the potential gain. When predicting, these
# samples are sent to either child accordingly. If a feature had no missing
# values during training, samples with missing values for that feature are sent
# to the child with the most samples.
# samples are sent to the learnt child accordingly. If a feature had no missing
# values during training, then for prediction, samples with missing values for that
# feature are sent to the child with the most samples (as seen during fit).
#
# The present example shows how HGBT regressions deal with values missing
# completely at random (MCAR), i.e. the missingness does not depend on the
Expand Down Expand Up @@ -313,10 +315,10 @@ def generate_missing_values(X, missing_fraction):
# model. One can still improve the quality of such estimations by:
#
# - collecting more data-points;
# - better tuning of the model hyperparameters (see
# :ref:`sphx_glr_auto_examples_ensemble_plot_gradient_boosting_quantile.py`);
# - engineering more predictive features from the same data (see
# :ref:`sphx_glr_auto_examples_applications_plot_cyclical_feature_engineering.py`).
# - better tuning of the model hyperparameters, see
# :ref:`sphx_glr_auto_examples_ensemble_plot_gradient_boosting_quantile.py`;
# - engineering more predictive features from the same data, see
# :ref:`sphx_glr_auto_examples_applications_plot_cyclical_feature_engineering.py`.
#
# Monotonic constraints
# =====================
Expand All @@ -342,7 +344,8 @@ def generate_missing_values(X, missing_fraction):
# - 0: no constraint
# - -1: monotonic decrease
#
# Else, one can pass an array-like encoding the above convention by position.
# Alternatively, one can pass an array-like object encoding the above convention by
# position.

from sklearn.inspection import PartialDependenceDisplay

Expand Down Expand Up @@ -394,8 +397,9 @@ def generate_missing_values(X, missing_fraction):
_ = plt.legend()

# %%
# Observe that `nswdemand` seems already monotonic without constraint. This is a
# good example to show that the model is "overconstraining".
# Observe that `nswdemand` and `vicdemand` seem already monotonic without constraint.
# This is a good example to show that the model with monotonicity constraints is
# "overconstraining".
#
# Additionally, we can verify that the predictive quality of the model is not
# significantly degraded by introducing the monotonic constraints. For such
Expand Down

0 comments on commit 3af257e

Please sign in to comment.