Random Selection of Variables for Aggregated Tree-Based Models

Gatnar, Eugeniusz; Rozmus, Dorota

dc.contributor.author	Gatnar, Eugeniusz
dc.contributor.author	Rozmus, Dorota
dc.date.accessioned	2016-03-24T13:49:35Z
dc.date.available	2016-03-24T13:49:35Z
dc.date.issued	2006
dc.identifier.issn	0208-6018
dc.identifier.uri	http://hdl.handle.net/11089/17540
dc.description	Zadanie pt. „Digitalizacja i udostępnienie w Cyfrowym Repozytorium Uniwersytetu Łódzkiego kolekcji czasopism naukowych wydawanych przez Uniwersytet Łódzki” nr 885/P-DUN/2014 zostało dofinansowane ze środków MNiSW w ramach działalności upowszechniającej naukę.	pl_PL
dc.description.abstract	Tree-based models are popular a widely used because they are simple, flexible and powerful tools for classification. Unfortunately they are not stable classifiers. Significant improvement of the model stability and prediction accuracy can be obtained by aggregation of multiple classification trees. Proposed methods, i.e. bagging, adaptive bagging, and arcing are based on sampling cases from the training set while boosting uses a system of weights for cases. The result is called committee of trees, an ensemble or a forest. Recent developments in this field showed that randomization (random selection of variables) in aggregated tree-based classifiers leads to consistent models while boosting can overfit. In this paper we discuss optimal parameter values for the method of random selection of variables (RandomForest) for an aggregated tree-based model (i.e. number of trees in the forest and number of variables selected for each split).	pl_PL
dc.description.sponsorship	Drzewa klasyfikacyjne, z uwagi na swoją prostotę, elastyczność i skuteczność stają się coraz częściej wykorzystywaną metodą klasyfikacji. Mimo wielu zalet, wadą tej metody jest brak stabilności. Poprawę stabilności i dokładności predykcji można osiągnąć poprzez agregację wielu drzew klasyfikacyjnych w jeden model. Proponowane w literaturze metody agregacji, takie jak: bagging, adaptive bagging i arcing opierają się na losowaniu obiektów ze zbioru uczącego; natomiast boosting stosuje dodatkowo system wag. W efekcie otrzymujemy zbiór drzew klasyfikacyjnych, tworzących model zagregowany. Ponieważ losowanie obiektów może powodować zmiany rozkładu zmiennych w zbiorze uczącym, dlatego poprawę dokładności predykcji można uzyskać poprzez losowy dobór zmiennych do prób uczących, w oparciu o które powstają modele składowe agregatu. W niniejszym artykule przedmiotem rozważań jest oszacowanie optymalnej wielkości parametrów dla procedury RandomForest, realizującej losowy dobór zmiennych do modelu w postaci zbioru zagregowanych drzew klasyfikacyjnych.	pl_PL
dc.language.iso	en	pl_PL
dc.publisher	Wydawnictwo Uniwersytetu Łódzkiego	pl_PL
dc.relation.ispartofseries	Acta Universitatis Lodziensis. Folia Oeconomica;196
dc.subject	Tree-based models	pl_PL
dc.subject	aggregation	pl_PL
dc.subject	RandomForest	pl_PL
dc.title	Random Selection of Variables for Aggregated Tree-Based Models	pl_PL
dc.title.alternative	Zastosowanie losowego doboru zmiennych w agregacji drzew klasyfikacyjnych	pl_PL
dc.type	Article	pl_PL
dc.rights.holder	© Copyright by Wydawnictwo Uniwersytetu Łódzkiego, Łódź 2006	pl_PL
dc.page.number	103-111	pl_PL
dc.contributor.authorAffiliation	The Karol Adamiecki University of Economics, Katowice, Department of Statistics	pl_PL
dc.references	Blake C., Keogh E., Merz C. J. (1998), UC1 Repository o f Machine Learning Databases, Departament of Information and Computer Science, University of California, Irvine, CA.	pl_PL
dc.references	Bauer E., Kohavi R. (1999), “An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants” , Machine Learning, 36, 105-142.	pl_PL
dc.references	Breiman L. (2003), Manual on Settings up, Using and Understanding Random Forest, http://oz.berkeley.edU/users/breiman/UsingrandomforestsV3.l.	pl_PL
dc.references	Breiman L. (2001), “Random Forests”, Machine Learning, 45, 5-32.	pl_PL
dc.references	Breiman L. (1999), “Using Adaptive Bagging to Debias Regressions” , Technical Report 547, Statistics Department, University of California, Berkeley.	pl_PL
dc.references	Breiman L. (1998), “Arcing Classifers”, Annals of Statistics, 26, 801-849.	pl_PL
dc.references	Breiman L. (1996), “Bagging Predictors”, Machine Learning, 24, 123-140.	pl_PL
dc.references	Dietterich T., Kong E. (1995), “Machine Learning Bias, Statistical Bias, and Statistical Variance of Decision Tree Algorithms” , Technical Report, Department of Computer Science, Oregon State University.	pl_PL
dc.references	Freund Y., Schapire R. E. (1997), “A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting” , Journal o f Computer and System Sciences, 55, 119-139.	pl_PL
dc.references	Gatnar E. (2001), Nonparametric Method for Discrimination and Regression, (in Polish) Wydawnictwo Naukowe PWN, Warszawa.	pl_PL
dc.references	Ho T. K. (1998), ‘The Random Subspace Method for Constructing Decision Forests”, IEEE Trans, on Pattern Analysis and Machine learning, 20, 832-844.	pl_PL
dc.references	Quinlan J. R. (1993), C4.S: Programs for Machine Learning, Morgan Kaufmann, San Mateo.	pl_PL
dc.references	Wolpert D. (1992), “Stacked Generalization”, Neural Networks, 5, 241-259.	pl_PL

Pliki tej pozycji

Nazwa:: foe196_Gatnar_Rozmus_103_111.pdf
Rozmiar:: 997.3KB
Format:: PDF

Oglądaj/Otwórz

Pozycja umieszczona jest w następujących kolekcjach

Acta Universitatis Lodziensis. Folia Oeconomica nr 196/2006 [28]

Pokaż uproszczony rekord