Skip to content

Commit

Permalink
DOC fix references for gradient boosting (#23035)
Browse files Browse the repository at this point in the history
  • Loading branch information
lorentzenchr committed Apr 8, 2022
1 parent 48a3c0b commit a4bdf3c
Showing 1 changed file with 27 additions and 16 deletions.
43 changes: 27 additions & 16 deletions doc/modules/ensemble.rst
Original file line number Diff line number Diff line change
Expand Up @@ -458,10 +458,9 @@ Gradient Tree Boosting

`Gradient Tree Boosting <https://en.wikipedia.org/wiki/Gradient_boosting>`_
or Gradient Boosted Decision Trees (GBDT) is a generalization
of boosting to arbitrary
differentiable loss functions. GBDT is an accurate and effective
off-the-shelf procedure that can be used for both regression and
classification problems in a
of boosting to arbitrary differentiable loss functions, see the seminal work of
[Friedman2001]_. GBDT is an accurate and effective off-the-shelf procedure that can be
used for both regression and classification problems in a
variety of areas including Web search ranking and ecology.

The module :mod:`sklearn.ensemble` provides methods
Expand Down Expand Up @@ -623,7 +622,7 @@ We found that ``max_leaf_nodes=k`` gives comparable results to ``max_depth=k-1``
but is significantly faster to train at the expense of a slightly higher
training error.
The parameter ``max_leaf_nodes`` corresponds to the variable ``J`` in the
chapter on gradient boosting in [F2001]_ and is related to the parameter
chapter on gradient boosting in [Friedman2001]_ and is related to the parameter
``interaction.depth`` in R's gbm package where ``max_leaf_nodes == interaction.depth + 1`` .

Mathematical formulation
Expand All @@ -635,12 +634,12 @@ case.
Regression
^^^^^^^^^^

GBRT regressors are additive models whose prediction :math:`y_i` for a
GBRT regressors are additive models whose prediction :math:`\hat{y}_i` for a
given input :math:`x_i` is of the following form:

.. math::
\hat{y_i} = F_M(x_i) = \sum_{m=1}^{M} h_m(x_i)
\hat{y}_i = F_M(x_i) = \sum_{m=1}^{M} h_m(x_i)
where the :math:`h_m` are estimators called *weak learners* in the context
of boosting. Gradient Tree Boosting uses :ref:`decision tree regressors
Expand Down Expand Up @@ -755,7 +754,7 @@ the parameter ``loss``:
target values.
* Huber (``'huber'``): Another robust loss function that combines
least squares and least absolute deviation; use ``alpha`` to
control the sensitivity with regards to outliers (see [F2001]_ for
control the sensitivity with regards to outliers (see [Friedman2001]_ for
more details).
* Quantile (``'quantile'``): A loss function for quantile regression.
Use ``0 < alpha < 1`` to specify the quantile. This loss function
Expand Down Expand Up @@ -785,7 +784,7 @@ the parameter ``loss``:
Shrinkage via learning rate
---------------------------

[F2001]_ proposed a simple regularization strategy that scales
[Friedman2001]_ proposed a simple regularization strategy that scales
the contribution of each weak learner by a constant factor :math:`\nu`:

.. math::
Expand All @@ -809,7 +808,7 @@ stopping. For a more detailed discussion of the interaction between
Subsampling
-----------

[F1999]_ proposed stochastic gradient boosting, which combines gradient
[Friedman2002]_ proposed stochastic gradient boosting, which combines gradient
boosting with bootstrap averaging (bagging). At each iteration
the base classifier is trained on a fraction ``subsample`` of
the available training data. The subsample is drawn without replacement.
Expand Down Expand Up @@ -896,6 +895,19 @@ based on permutation of the features.

* :ref:`sphx_glr_auto_examples_ensemble_plot_gradient_boosting_regression.py`

.. topic:: References

.. [Friedman2001] Friedman, J.H. (2001). :doi:`Greedy function approximation: A gradient
boosting machine <10.1214/aos/1013203451>`.
Annals of Statistics, 29, 1189-1232.
.. [Friedman2002] Friedman, J.H. (2002). `Stochastic gradient boosting.
<https://statweb.stanford.edu/~jhf/ftp/stobst.pdf>`_.
Computational Statistics & Data Analysis, 38, 367-378.
.. [R2007] G. Ridgeway (2006). `Generalized Boosted Models: A guide to the gbm
package <https://cran.r-project.org/web/packages/gbm/vignettes/gbm.pdf>`_
.. _histogram_based_gradient_boosting:

Histogram-Based Gradient Boosting
Expand Down Expand Up @@ -1210,17 +1222,16 @@ Finally, many parts of the implementation of

.. topic:: References

.. [F1999] Friedmann, Jerome H., 2007, `"Stochastic Gradient Boosting"
<https://statweb.stanford.edu/~jhf/ftp/stobst.pdf>`_
.. [R2007] G. Ridgeway, "Generalized Boosted Models: A guide to the gbm
package", 2007
.. [XGBoost] Tianqi Chen, Carlos Guestrin, :arxiv:`"XGBoost: A Scalable Tree
Boosting System" <1603.02754>`
.. [LightGBM] Ke et. al. `"LightGBM: A Highly Efficient Gradient
BoostingDecision Tree" <https://papers.nips.cc/paper/
6907-lightgbm-a-highly-efficient-gradient-boosting-decision-tree>`_
.. [Fisher1958] Walter D. Fisher. `"On Grouping for Maximum Homogeneity"
<http://www.csiss.org/SPACE/workshops/2004/SAC/files/fisher.pdf>`_
.. [Fisher1958] Fisher, W.D. (1958). `"On Grouping for Maximum Homogeneity"
<http://csiss.ncgia.ucsb.edu/SPACE/workshops/2004/SAC/files/fisher.pdf>`_
Journal of the American Statistical Association, 53, 789-798.
.. _voting_classifier:

Expand Down

0 comments on commit a4bdf3c

Please sign in to comment.