Update documents and tests. (#7659)

* Revise documents after recent refactoring and cat support. * Add tests for behavior of max_depth and max_leaves.
dmlc · Feb 25, 2022 · 18a4af6 · 18a4af6
1 parent 5eed299
commit 18a4af6
Show file tree

Hide file tree

Showing 7 changed files with 142 additions and 44 deletions.
diff --git a/doc/parameter.rst b/doc/parameter.rst
@@ -74,8 +74,8 @@ Parameters for Tree Booster
 
 * ``max_depth`` [default=6]
 
-  - Maximum depth of a tree. Increasing this value will make the model more complex and more likely to overfit. 0 is only accepted in ``lossguide`` growing policy when ``tree_method`` is set as ``hist`` or ``gpu_hist`` and it indicates no limit on depth. Beware that XGBoost aggressively consumes memory when training a deep tree.
-  - range: [0,∞] (0 is only accepted in ``lossguide`` growing policy when ``tree_method`` is set as ``hist`` or ``gpu_hist``)
+  - Maximum depth of a tree. Increasing this value will make the model more complex and more likely to overfit. 0 indicates no limit on depth. Beware that XGBoost aggressively consumes memory when training a deep tree. ``exact`` tree method requires non-zero value.
+  - range: [0,∞]
 
 * ``min_child_weight`` [default=1]
 
@@ -164,7 +164,7 @@ Parameters for Tree Booster
 
   - Control the balance of positive and negative weights, useful for unbalanced classes. A typical value to consider: ``sum(negative instances) / sum(positive instances)``. See :doc:`Parameters Tuning </tutorials/param_tuning>` for more discussion. Also, see Higgs Kaggle competition demo for examples: `R <https://github.com/dmlc/xgboost/blob/master/demo/kaggle-higgs/higgs-train.R>`_, `py1 <https://github.com/dmlc/xgboost/blob/master/demo/kaggle-higgs/higgs-numpy.py>`_, `py2 <https://github.com/dmlc/xgboost/blob/master/demo/kaggle-higgs/higgs-cv.py>`_, `py3 <https://github.com/dmlc/xgboost/blob/master/demo/guide-python/cross_validation.py>`_.
 
-* ``updater`` [default= ``grow_colmaker,prune``]
+* ``updater``
 
   - A comma separated string defining the sequence of tree updaters to run, providing a modular way to construct and to modify the trees. This is an advanced parameter that is usually set automatically, depending on some other parameters. However, it could be also set explicitly by a user. The following updaters exist:
 
@@ -177,8 +177,6 @@ Parameters for Tree Booster
     - ``refresh``: refreshes tree's statistics and/or leaf values based on the current data. Note that no random subsampling of data rows is performed.
     - ``prune``: prunes the splits where loss < min_split_loss (or gamma) and nodes that have depth greater than ``max_depth``.
 
-  - In a distributed setting, the implicit updater sequence value would be adjusted to ``grow_histmaker,prune`` by default, and you can set ``tree_method`` as ``hist`` to use ``grow_histmaker``.
-
 * ``refresh_leaf`` [default=1]
 
   - This is a parameter of the ``refresh`` updater. When this flag is 1, tree leafs as well as tree nodes' stats are updated. When it is 0, only node stats are updated.
@@ -194,19 +192,19 @@ Parameters for Tree Booster
 * ``grow_policy`` [default= ``depthwise``]
 
   - Controls a way new nodes are added to the tree.
-  - Currently supported only if ``tree_method`` is set to ``hist`` or ``gpu_hist``.
+  - Currently supported only if ``tree_method`` is set to ``hist``, ``approx`` or ``gpu_hist``.
   - Choices: ``depthwise``, ``lossguide``
 
     - ``depthwise``: split at nodes closest to the root.
     - ``lossguide``: split at nodes with highest loss change.
 
 * ``max_leaves`` [default=0]
 
-  - Maximum number of nodes to be added. Only relevant when ``grow_policy=lossguide`` is set.
+  - Maximum number of nodes to be added.  Not used by ``exact`` tree method.
 
 * ``max_bin``, [default=256]
 
-  - Only used if ``tree_method`` is set to ``hist`` or ``gpu_hist``.
+  - Only used if ``tree_method`` is set to ``hist``, ``approx`` or ``gpu_hist``.
   - Maximum number of discrete bins to bucket continuous features.
   - Increasing this number improves the optimality of splits at the cost of higher computation time.
 

diff --git a/doc/treemethod.rst b/doc/treemethod.rst
@@ -114,3 +114,32 @@ was never tested and contained some unknown bugs, we decided to remove it and fo
 resources on more promising algorithms instead.  For accuracy, most of the time
 ``approx``, ``hist`` and ``gpu_hist`` are enough with some parameters tuning, so removing
 them don't have any real practical impact.
+
+
+**************
+Feature Matrix
+**************
+
+Following table summarizes some differences in supported features between 4 tree methods,
+`T` means supported while `F` means unsupported.
+
++------------------+-----------+---------------------+---------------------+------------------------+
+|                  | Exact     | Approx              | Hist                | GPU Hist               |
++==================+===========+=====================+=====================+========================+
+| grow_policy      | Depthwise | depthwise/lossguide | depthwise/lossguide | depthwise/lossguide    |
++------------------+-----------+---------------------+---------------------+------------------------+
+| max_leaves       | F         | T                   | T                   | T                      |
++------------------+-----------+---------------------+---------------------+------------------------+
+| sampling method  | uniform   | uniform             | uniform             | gradient_based/uniform |
++------------------+-----------+---------------------+---------------------+------------------------+
+| categorical data | F         | T                   | T                   | T                      |
++------------------+-----------+---------------------+---------------------+------------------------+
+| External memory  | F         | T                   | P                   | P                      |
++------------------+-----------+---------------------+---------------------+------------------------+
+| Distributed      | F         | T                   | T                   | T                      |
++------------------+-----------+---------------------+---------------------+------------------------+
+
+Features/parameters that are not mentioned here are universally supported for all 4 tree
+methods (for instance, column sampling and constraints).  The `P` in external memory means
+partially supported.  Please note that both categorical data and external memory are
+experimental.
diff --git a/doc/tutorials/feature_interaction_constraint.rst b/doc/tutorials/feature_interaction_constraint.rst
@@ -129,7 +129,7 @@ first and second constraints (``[0, 1]``, ``[2, 3, 4]``).
 
 .. |fig1| image:: ../_static/feature_interaction_illustration2.svg
    :scale: 7%
-   :align: middle  
+   :align: middle
 
 .. |fig2| image:: ../_static/feature_interaction_illustration3.svg
    :scale: 7%
@@ -174,11 +174,6 @@ parameter:
                                      num_boost_round = 1000, evals = evallist,
                                      early_stopping_rounds = 10)
 
-**Choice of tree construction algorithm**. To use feature interaction constraints, be sure
-to set the ``tree_method`` parameter to one of the following: ``exact``, ``hist``,
-``approx`` or ``gpu_hist``.  Support for ``gpu_hist`` and ``approx`` is added only in
-1.0.0.
-
 **************
 Advanced topic
 **************

diff --git a/doc/tutorials/monotonic.rst b/doc/tutorials/monotonic.rst
@@ -2,23 +2,23 @@
 Monotonic Constraints
 #####################
 
-It is often the case in a modeling problem or project that the functional form of an acceptable model is constrained in some way. This may happen due to business considerations, or because of the type of scientific question being investigated.  In some cases, where there is a very strong prior belief that the true relationship has some quality, constraints can be used to improve the predictive performance of the model. 
+It is often the case in a modeling problem or project that the functional form of an acceptable model is constrained in some way. This may happen due to business considerations, or because of the type of scientific question being investigated.  In some cases, where there is a very strong prior belief that the true relationship has some quality, constraints can be used to improve the predictive performance of the model.
 
 A common type of constraint in this situation is that certain features bear a **monotonic** relationship to the predicted response:
 
 .. math::
 
   f(x_1, x_2, \ldots, x, \ldots, x_{n-1}, x_n) \leq f(x_1, x_2, \ldots, x', \ldots, x_{n-1}, x_n)
 
-whenever :math:`x \leq x'` is an **increasing constraint**; or 
+whenever :math:`x \leq x'` is an **increasing constraint**; or
 
 .. math::
 
   f(x_1, x_2, \ldots, x, \ldots, x_{n-1}, x_n) \geq f(x_1, x_2, \ldots, x', \ldots, x_{n-1}, x_n)
 
 whenever :math:`x \leq x'` is a **decreasing constraint**.
 
-XGBoost has the ability to enforce monotonicity constraints on any features used in a boosted model. 
+XGBoost has the ability to enforce monotonicity constraints on any features used in a boosted model.
 
 ****************
 A Simple Example
@@ -60,8 +60,8 @@ Suppose the following code fits your model without monotonicity constraints
 
 .. code-block:: python
 
-  model_no_constraints = xgb.train(params, dtrain, 
-                                   num_boost_round = 1000, evals = evallist, 
+  model_no_constraints = xgb.train(params, dtrain,
+                                   num_boost_round = 1000, evals = evallist,
                                    early_stopping_rounds = 10)
 
 Then fitting with monotonicity constraints only requires adding a single parameter
@@ -71,8 +71,8 @@ Then fitting with monotonicity constraints only requires adding a single paramet
   params_constrained = params.copy()
   params_constrained['monotone_constraints'] = "(1,-1)"
 
-  model_with_constraints = xgb.train(params_constrained, dtrain, 
-                                     num_boost_round = 1000, evals = evallist, 
+  model_with_constraints = xgb.train(params_constrained, dtrain,
+                                     num_boost_round = 1000, evals = evallist,
                                      early_stopping_rounds = 10)
 
 In this example the training data ``X`` has two columns, and by using the parameter values ``(1,-1)`` we are telling XGBoost to impose an increasing constraint on the first predictor and a decreasing constraint on the second.
@@ -82,14 +82,11 @@ Some other examples:
 - ``(1,0)``: An increasing constraint on the first predictor and no constraint on the second.
 - ``(0,-1)``: No constraint on the first predictor and a decreasing constraint on the second.
 
-**Choice of tree construction algorithm**. To use monotonic constraints, be
-sure to set the ``tree_method`` parameter to one of ``exact``, ``hist``, and
-``gpu_hist``.
 
 **Note for the 'hist' tree construction algorithm**.
-If ``tree_method`` is set to either ``hist`` or ``gpu_hist``, enabling monotonic
-constraints may produce unnecessarily shallow trees. This is because the
+If ``tree_method`` is set to either ``hist``, ``approx`` or ``gpu_hist``, enabling
+monotonic constraints may produce unnecessarily shallow trees. This is because the
 ``hist`` method reduces the number of candidate splits to be considered at each
-split. Monotonic constraints may wipe out all available split candidates, in
-which case no split is made. To reduce the effect, you may want to increase
-the ``max_bin`` parameter to consider more split candidates.
+split. Monotonic constraints may wipe out all available split candidates, in which case no
+split is made. To reduce the effect, you may want to increase the ``max_bin`` parameter to
+consider more split candidates.
diff --git a/src/tree/updater_colmaker.cc b/src/tree/updater_colmaker.cc
@@ -174,6 +174,8 @@ class ColMaker: public TreeUpdater {
       std::vector<int> newnodes;
       this->InitData(gpair, *p_fmat);
       this->InitNewNode(qexpand_, gpair, *p_fmat, *p_tree);
+      // We can check max_leaves too, but might break some grid searching pipelines.
+      CHECK_GT(param_.max_depth, 0) << "exact tree method doesn't support unlimited depth.";
       for (int depth = 0; depth < param_.max_depth; ++depth) {
         this->FindSplit(depth, qexpand_, gpair, p_fmat, p_tree);
         this->ResetPosition(qexpand_, p_fmat, *p_tree);

diff --git a/tests/cpp/tree/test_tree_policy.cc b/tests/cpp/tree/test_tree_policy.cc
@@ -20,15 +20,89 @@ class TestGrowPolicy : public ::testing::Test {
             true);
   }
 
-  void TestTreeGrowPolicy(std::string tree_method, std::string policy) {
-    {
-      std::unique_ptr<Learner> learner{Learner::Create({this->Xy_})};
-      learner->SetParam("tree_method", tree_method);
-      learner->SetParam("max_leaves", "16");
-      learner->SetParam("grow_policy", policy);
-      learner->Configure();
+  std::unique_ptr<Learner> TrainOneIter(std::string tree_method, std::string policy,
+                                        int32_t max_leaves, int32_t max_depth) {
+    std::unique_ptr<Learner> learner{Learner::Create({this->Xy_})};
+    learner->SetParam("tree_method", tree_method);
+    if (max_leaves >= 0) {
+      learner->SetParam("max_leaves", std::to_string(max_leaves));
+    }
+    if (max_depth >= 0) {
+      learner->SetParam("max_depth", std::to_string(max_depth));
+    }
+    learner->SetParam("grow_policy", policy);
+
+    auto check_max_leave = [&]() {
+      Json model{Object{}};
+      learner->SaveModel(&model);
+      auto j_tree = model["learner"]["gradient_booster"]["model"]["trees"][0];
+      RegTree tree;
+      tree.LoadModel(j_tree);
+      CHECK_LE(tree.GetNumLeaves(), max_leaves);
+    };
+
+    auto check_max_depth = [&](int32_t sol) {
+      Json model{Object{}};
+      learner->SaveModel(&model);
+
+      auto j_tree = model["learner"]["gradient_booster"]["model"]["trees"][0];
+      RegTree tree;
+      tree.LoadModel(j_tree);
+      bst_node_t depth = 0;
+      tree.WalkTree([&](bst_node_t nidx) {
+        depth = std::max(tree.GetDepth(nidx), depth);
+        return true;
+      });
+      if (sol > -1) {
+        CHECK_EQ(depth, sol);
+      } else {
+        CHECK_EQ(depth, max_depth) << "tree method: " << tree_method << " policy: " << policy
+                                   << " leaves:" << max_leaves << ", depth:" << max_depth;
+      }
+    };
 
+    if (max_leaves == 0 && max_depth == 0) {
+      // unconstrainted
+      if (tree_method != "gpu_hist") {
+        // GPU pre-allocates for all nodes.
+        learner->UpdateOneIter(0, Xy_);
+      }
+    } else if (max_leaves > 0 && max_depth == 0) {
+      learner->UpdateOneIter(0, Xy_);
+      check_max_leave();
+    } else if (max_leaves == 0 && max_depth > 0) {
+      learner->UpdateOneIter(0, Xy_);
+      check_max_depth(-1);
+    } else if (max_leaves > 0 && max_depth > 0) {
       learner->UpdateOneIter(0, Xy_);
+      check_max_leave();
+      check_max_depth(2);
+    } else if (max_leaves == -1 && max_depth == 0) {
+      // default max_leaves is 0, so both of them are now 0
+    } else {
+      // default parameters
+      learner->UpdateOneIter(0, Xy_);
+    }
+    return learner;
+  }
+
+  void TestCombination(std::string tree_method) {
+    for (auto policy : {"depthwise", "lossguide"}) {
+      // -1 means default
+      for (auto leaves : {-1, 0, 3}) {
+        for (auto depth : {-1, 0, 3}) {
+          this->TrainOneIter(tree_method, policy, leaves, depth);
+        }
+      }
+    }
+  }
+
+  void TestTreeGrowPolicy(std::string tree_method, std::string policy) {
+    {
+      /**
+       *  max_leaves
+       */
+      auto learner = this->TrainOneIter(tree_method, policy, 16, -1);
       Json model{Object{}};
       learner->SaveModel(&model);
 
@@ -38,13 +112,10 @@ class TestGrowPolicy : public ::testing::Test {
       ASSERT_EQ(tree.GetNumLeaves(), 16);
     }
     {
-      std::unique_ptr<Learner> learner{Learner::Create({this->Xy_})};
-      learner->SetParam("tree_method", tree_method);
-      learner->SetParam("max_depth", "3");
-      learner->SetParam("grow_policy", policy);
-      learner->Configure();
-
-      learner->UpdateOneIter(0, Xy_);
+      /**
+       *  max_depth
+       */
+      auto learner = this->TrainOneIter(tree_method, policy, -1, 3);
       Json model{Object{}};
       learner->SaveModel(&model);
 
@@ -64,17 +135,23 @@ class TestGrowPolicy : public ::testing::Test {
 TEST_F(TestGrowPolicy, Approx) {
   this->TestTreeGrowPolicy("approx", "depthwise");
   this->TestTreeGrowPolicy("approx", "lossguide");
+
+  this->TestCombination("approx");
 }
 
 TEST_F(TestGrowPolicy, Hist) {
   this->TestTreeGrowPolicy("hist", "depthwise");
   this->TestTreeGrowPolicy("hist", "lossguide");
+
+  this->TestCombination("hist");
 }
 
 #if defined(XGBOOST_USE_CUDA)
 TEST_F(TestGrowPolicy, GpuHist) {
   this->TestTreeGrowPolicy("gpu_hist", "depthwise");
   this->TestTreeGrowPolicy("gpu_hist", "lossguide");
+
+  this->TestCombination("gpu_hist");
 }
 #endif  // defined(XGBOOST_USE_CUDA)
 }  // namespace xgboost
diff --git a/tests/python-gpu/test_gpu_prediction.py b/tests/python-gpu/test_gpu_prediction.py
@@ -22,7 +22,7 @@ def noop(*args, **kwargs):
 rng = np.random.RandomState(1994)
 
 shap_parameter_strategy = strategies.fixed_dictionaries({
-    'max_depth': strategies.integers(0, 11),
+    'max_depth': strategies.integers(1, 11),
     'max_leaves': strategies.integers(0, 256),
     'num_parallel_tree': strategies.sampled_from([1, 10]),
 }).filter(lambda x: x['max_depth'] > 0 or x['max_leaves'] > 0)