Random Selection of Variables for Aggregated Tree-Based Models
Abstract
Tree-based models are popular a widely used because they are simple, flexible
and powerful tools for classification. Unfortunately they are not stable classifiers.
Significant improvement of the model stability and prediction accuracy can be obtained
by aggregation of multiple classification trees. Proposed methods, i.e. bagging, adaptive bagging,
and arcing are based on sampling cases from the training set while boosting uses a system
of weights for cases. The result is called committee of trees, an ensemble or a forest.
Recent developments in this field showed that randomization (random selection of variables)
in aggregated tree-based classifiers leads to consistent models while boosting can
overfit.
In this paper we discuss optimal parameter values for the method of random selection
of variables (RandomForest) for an aggregated tree-based model (i.e. number of trees in the
forest and number of variables selected for each split).
Collections