Random Selection of Variables for Aggregated Tree-Based Models

Gatnar, Eugeniusz; Rozmus, Dorota

View/Open

foe196_Gatnar_Rozmus_103_111.pdf (997.3Kb)

Date

2006

Author

Gatnar, Eugeniusz

Rozmus, Dorota

Metadata

Show full item record

Abstract

Tree-based models are popular a widely used because they are simple, flexible and powerful tools for classification. Unfortunately they are not stable classifiers. Significant improvement of the model stability and prediction accuracy can be obtained by aggregation of multiple classification trees. Proposed methods, i.e. bagging, adaptive bagging, and arcing are based on sampling cases from the training set while boosting uses a system of weights for cases. The result is called committee of trees, an ensemble or a forest. Recent developments in this field showed that randomization (random selection of variables) in aggregated tree-based classifiers leads to consistent models while boosting can overfit. In this paper we discuss optimal parameter values for the method of random selection of variables (RandomForest) for an aggregated tree-based model (i.e. number of trees in the forest and number of variables selected for each split).

URI

http://hdl.handle.net/11089/17540

Collections

Acta Universitatis Lodziensis. Folia Oeconomica nr 196/2006 [28]