Abstract
In this paper, a data clustering method named consensus fuzzy k-modes clustering is proposed to improve the performance of the clustering for the categorical data. At the same time, the coupling DNA-chain-hypergraph P system is constructed to realize the process of the clustering. This P system can prevent the clustering algorithm falling into the local optimum and realize the clustering process in implicit parallelism. The consensus fuzzy k-modes algorithm can combine the advantages of the fuzzy k-modes algorithm, weight fuzzy k-modes algorithm and genetic fuzzy k-modes algorithm. The fuzzy k-modes algorithm can realize the soft partition which is closer to reality, but treats all the variables equally. The weight fuzzy k-modes algorithm introduced the weight vector which strengthens the basic k-modes clustering by associating higher weights with features useful in analysis. These two methods are only improvements the k-modes algorithm itself. So, the genetic k-modes algorithm is proposed which used the genetic operations in the clustering process. In this paper, we examine these three kinds of k-modes algorithms and further introduce DNA genetic optimization operations in the final consensus process. Finally, we conduct experiments on the seven UCI datasets and compare the clustering results with another four categorical clustering algorithms. The experiment results and statistical test results show that our method can get better clustering results than the compared clustering algorithms, respectively.
Highlights
Data clustering has recently attracted more attentions in practical applications
At the same time, considering the structure of the DCHP system and the characteristics of the three basic clustering algorithms, we need to guarantee that the basic partitions (BPs) generated by the different algorithms are the same
We propose a novel P system (DCHP) with a hybrid structure which combines the advantage of the chain structure and hypergraph topology structure for the consensus fuzzy k-modes clustering
Summary
Data clustering has recently attracted more attentions in practical applications. In this method, the distance between the clustering center and the data objects are calculated by the standard distance metrics. There are many classification datasets that do not have a natural order or distance between the parts. In the real world, each classification attribute of blood type has a unique classification value, such as [A, B, O or AB]. Research into categorical data is a difficult and challenging task, which attracts many data mining researchers
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have