Random Selection of Variables for Aggregated Tree-Based Models
MetadataShow full item record
Tree-based models are popular a widely used because they are simple, flexible and powerful tools for classification. Unfortunately they are not stable classifiers. Significant improvement of the model stability and prediction accuracy can be obtained by aggregation of multiple classification trees. Proposed methods, i.e. bagging, adaptive bagging, and arcing are based on sampling cases from the training set while boosting uses a system of weights for cases. The result is called committee of trees, an ensemble or a forest. Recent developments in this field showed that randomization (random selection of variables) in aggregated tree-based classifiers leads to consistent models while boosting can overfit. In this paper we discuss optimal parameter values for the method of random selection of variables (RandomForest) for an aggregated tree-based model (i.e. number of trees in the forest and number of variables selected for each split).