Clustering in the field of social sciences: that is your choice

Jaime R.S Fonseca

doi:10.1080/13645579.2012.716973

Abstract

Clustering seeks to identify a finite set of clusters to describe data. Cluster analysis is partitioning similar objects into meaningful classes, when both the number of classes and their composition are to be determined. Nowadays, we often see illustrations concerning the use of latent class models (LCM) in the field of cluster analysis. They provide a useful probabilistic/statistical method for grouping observations into clusters. In this approach to clustering, each different cluster in the population is assumed to be described by a different probability distribution, which may belong to the same family but differ in the values they take for the parameters of the distribution. The goal of this research is cluster analysis and LCM comparison, and methodologically we considered three data-sets: one with solely continuous variables, one with only binary variables and one with mixed variables. In all situations, LCM performed reasonably well; in contrast, cluster analysis achieved both the best (90.7%, only continuous variables) and the worst performance (40%, mixed variables).

Full Text