Add a facility that allows random forest classifiers to be combined after training #26326
Labels
module:ensemble
Needs Decision - Include Feature
Requires decision regarding including feature
New Feature
Describe the workflow you want to enable
In a federated environment, I have federation elements that build private random forest classifiers, which I would like to combine after the fact into a single random forest.
Describe your proposed solution
See the "alternatives" section.
Describe alternatives you've considered, if relevant
Stacking might suffice as a work-around, although I'd like to avoid that.
As a throw-away experiment, simply concatenating all the constituent decision tree estimators into a common estimators_ array (and adjusting the count) seems to work superficially, but clearly isn't good practice.
In addition, this approach can fail, if, say, we try to combine random forest instance #1 which has classes_ of [dog, cat] and forest #2 which has classes_ of [cow, dog, cat]. To address that concern, I looked at forcing the union of all possible classes (over all the forests) into the resultant combined forest, and the underlying trees. This appears to work at some level, but doesn't handle misshapen oob_decision_function_ which is shaped according n_classes_.
Another approach to dealing with classes_ heterogeneity is to make sure each federation forest is exposed to the full gamut of potential classes during training. (Even then, one worries about the order of the elements found in classes_ : [dog,cat] vs [cat,dog]). It appears that classes_ is constructed before any bootstrap sampling, so, assuming we can rely on that implementation detail, and we expose each federation to a consistently ordered and specially constructed "gamut" pre-pended to their X, we can (hopefully) expect all forest instances to have identical classes_ with the same elements in the same order. That, in turn, would allow easier combining of the forests. Ensuring complete exposure via the "gamut" might also impact accuracy. (The training "gamut" is a minimal set of X records that produce all possible y categorical values).
Additional context
R provides a combine() operator.
The text was updated successfully, but these errors were encountered: