Abstract

The popular clustering ensemble algorithms cannot give the appropriate treatment program in the light of the different characteristics of the different data sets.A new clustering ensemble algorithm — Enhanced Clustering Ensemble algorithm based on Characteristics of Data sets(ECECD) was proposed for overcoming this defect.ECECD was composed of generation of base clustering,selection of base clustering and consensus function.It selected a special range of ensemble members to form the final ensemble and produced the final clustering based on the characteristic of the data set.Three Benchmark data sets including ecoli,leukaemia and Vehicle were clustered in the experiment,and the clustering errors gained by the proposed algorithm were 0.014,0.489 and 0.361 respectively,which were always the minimum compared with that of the other algorithms such as Bagging based Structure Ensemble Approach(BSEA),Hybrid Cluster Ensemble(HCE) and Cluster-Oriented Ensemble Classifier(COES).The Normalized Mutual Information(NMI) values of the proposed algorithm were also always higher than that of these algorithms when increasing candidate base clusterings.Therefore,compared with these popular clustering ensemble algorithms,the proposed algorithm has the highest clustering precision and the strongest scalability.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call