A Monte Carlo investigation of two distance measures between statistical populations and their application to cluster analysis
Streszczenie
The paper deals with a simulation study of one of the well-known
hierarchical cluster analysis methods applied to classifying the statistical populations.
In particular, the problem of clustering the univariate normal populations is studied.
Two measures of the distance between statistical populations are considered: the
Mahalanobis distance measure which is defined for normally distributed populations
under assumption that the covariance matrices are equal and the Kullback-Leibler
divergence (the so called Generalized Mahalanobis Distance) the use of which is
extended on populations of any distribution.
The simulation study is concerned with the set of 15 univariate normal populations,
variances of which are chanched during successive steps. The aim is to study robustness
of the nearest neighbour method to departure from the variance equality assumption
when the Mahalanobis distance formula is applied. The differences between two cluster
families, obtained for the same set of populations but with the different distance
matrices applied, are studied. The distance between both final cluster sets is measured
by means of the Marczewski-Steinhaus distance.
Collections