Comparative Assessment of Some Selected Methods of Determining the Number of Clusters in a Data Set

Korzeniewski, Jerzy

dc.contributor.author	Korzeniewski, Jerzy
dc.date.accessioned	2016-02-01T12:11:23Z
dc.date.available	2016-02-01T12:11:23Z
dc.date.issued	2007
dc.identifier.issn	0208-6018
dc.identifier.uri	http://hdl.handle.net/11089/16838
dc.description.abstract	This paper is an attempt to compare the performance of an algorithm for determining the number of clusters in a data set proposed by the author with other methods of determining the number of clusters. The idea of the new algorithm is based on the comparison of pseudo cumulative distribution functions of a certain random variable. For a fixed window size we draw К different points and for every point we find the corresponding limiting point in the mean shift procedure. Then we check if the distance (e.g. Euclidean) between every pair of the limiting points is greater than the window size. Analogously we determine the pseudo cumulative distribution functions for different numbers К of clusters. Out of all pseudo cumulative distribution functions we pick the proper one i.e. the last one” (with respect to K) which has a horizontal phase. Other methods of determining the number of clusters in a data set are compared with the proposed algorithm in a number of examples of two dimensional data sets for different clustering methods (k-means clustering and minimum distance agglomeration).	pl_PL
dc.description.abstract	Artykuł niniejszy jest próbą oceny porównawczej algorytmu wyznaczającego ilość skupień w zbiorze danych, zaproponowanego przez autora, z innymi metodami wyznaczania ilości skupień. Algorytm autora oparty jest na porównaniu pseudodystrybuant pewnej zmiennej losowej dla różnych ilości skupień. Ta zmienna losowa jest zdefiniowana w następujący sposób. Dla ustalonego rozmiaru okna losujemy ze zbioru danych К różnych punktów i dla każdego z tych punktów znajdujemy odpowiadający mu punkt graniczny w procedurze średniego przesunięcia próby. Następnie sprawdzamy, czy odległość (np. euklidesowa) pomiędzy każdą parą punktów granicznych jest większa od rozmiaru okna. Analogicznie wyznaczamy pseudodystrybuanty dla różnych ilości К skupień. Ze wszystkich dystrybuant za prawidłowo określającą ilość skupień uznajemy tę, która odpowiada ostatniej (względem K) krzywej, posiadającej fazę poziomą. Inne metody określania liczby skupień w zbiorze danych są porównane z zaproponowanym algorytmem na przykładach kilku dwuwymiarowych zbiorów danych dla dwóch, diametralnie różnych w naturze, metod konstruowania skupień.	pl_PL
dc.description.sponsorship	Zadanie pt. „Digitalizacja i udostępnienie w Cyfrowym Repozytorium Uniwersytetu Łódzkiego kolekcji czasopism naukowych wydawanych przez Uniwersytet Łódzki” nr 885/P-DUN/2014 zostało dofinansowane ze środków MNiSW w ramach działalności upowszechniającej naukę.	pl_PL
dc.language.iso	en	pl_PL
dc.publisher	Wydawnictwo Uniwersytetu Łódzkiego	pl_PL
dc.relation.ispartofseries	Acta Universitatis Lodziensis. Folia Oeconomica;206
dc.subject	cluster analysis	pl_PL
dc.subject	number of clusters	pl_PL
dc.subject	computer algorithm	pl_PL
dc.subject	mean shift method	pl_PL
dc.title	Comparative Assessment of Some Selected Methods of Determining the Number of Clusters in a Data Set	pl_PL
dc.title.alternative	Ocena porównawcza wybranych metod wyznaczających ilość skupień w zbiorze danych	pl_PL
dc.type	Article	pl_PL
dc.rights.holder	© Copyright by Wydawnictwo Uniwersytetu Łódzkiego, Łódź 2007	pl_PL
dc.page.number	177-187	pl_PL
dc.contributor.authorAffiliation	University of Łódź, Department of Statistical Methods	pl_PL
dc.references	Comaniciu D., Meer P. (1999), Mean Shift Analysis and Applications, IEEE Int. Conf. Computer Vision (ICCV’99), Kcrkyra, Greece, 1197-1203.	pl_PL
dc.references	Gordon A. D. (1999), Classification, Chapman & Hall, New York.	pl_PL
dc.references	Sugar C. A., James G. M. (2003), Finding the Number of Clusters in a Dataset: An Information - Theoretic Approach, JASA, 98, 750-763.	pl_PL

Files in this item

Name:: foe206_Jerzy_Korzeniewski_177_ ...
Size:: 1.505Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Acta Universitatis Lodziensis. Folia Oeconomica nr 206/2007 [22]

Show simple item record