Abstract

Determining the number of clusters in a data set is a significant and difficult problem in cluster analysis. In this study, a new model-based clustering approach is proposed for the estimation of the number of clusters. In the proposed method, the number of components in each variable is determined by using univariate Gaussian mixture models. The number of alternative cluster centres and mixture models was determined according to the number of components in heterogeneous variables. In this study, appropriate Gaussian mixture models were determined with the help of "mixture model soft computing method" for the first time. Vector arrays showing the number and addresses of clusters in appropriate Gaussian mixture models were created, and according to the parameter estimations of these models that fit the arrays, the best model was obtained through information criteria. The clustering success achieved with the proposed mixture model soft computing method was compared with the results of Gaussian mixture model clustering methods namely, mclust, clustvarsel, varselLCM, selvarMix and vscc model selection methods in R package. All respective methods analyse and determine the number of clustering for the data sets, synthetic-1, synthetic-2, Iris, and Landsat Satellite Image data sets, respectively and evaluate the correct classification rate. The results revealed that the proposed method shows better results for the determination of number of clustering as well as correct classification rate. The novelty of the study is that a new model-based dimension reduction method is proposed for the estimation of the number of clusters. A deterministic clustering approach is proposed for clustering and classification success on reduced data.

Highlights

  • Model-based clustering is widely used in cluster analysis for clustering data from the mixture of Gaussian distributions

  • According to the CCRs calculated for the mclust based methods and MMSCM and shown in Table 14, a higher success rate with 32%, in other words, better model fit was achieved as a result of components determined by MMSCM and dimension reduction, while mclust based methods directed a wrong number of components as 2

  • In this study, based on the components of heterogeneous variables in the data according to the mixture model soft computing method, a novel method was proposed for determining the clustering in Gaussian mixture models (GMM)

Read more

Summary

Introduction

Model-based clustering is widely used in cluster analysis for clustering data from the mixture of Gaussian distributions. Components in the heterogeneous variable are used to determine the number and the location of clusters in the mixture model [6]. Akogul and Erisoglu proposed a model-based clustering method that uses Analytic Hierarchy Process (AHP) to reveal clustering in the data set. The most appropriate model is determined among the candidate subclusters according to the assumption based on the Bayesian Information Criterion (BIC) difference [12]. It is very important in clustering analysis to prevent information loss within the variables that are reduced while variable selection occurs.

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call