Abstract

The estimation of mixture models is a well-known approach for cluster analysis and several criteria have been proposed to select the number of clusters. In this paper, we consider mixture models using side-information, which gives the constraint that some data in a group originate from the same source. Then the usual criteria are not suitable. An EM (Expectation-Maximization) algorithm has been previously developed to jointly allow the determination of the model parameters and the data labelling, for a given number of clusters. In this work we adapt three usual criteria, which are the bayesian information criterion (BIC), the Akaike information criterion (AIC), and the entropy criterion (NEC), so that they can take into consideration the side-information. One simulated problem and two real data sets have been used to show the relevance of the modified criterion versions and compare the criteria. The efficiency of both the EM algorithm and the criteria, for selecting the right number of clusters while getting a good clustering, is in relation with the amount of side-information. Side-information being mainly useful when the clusters overlap, the best criterion is the modified BIC.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call