Abstract

Due to the plurality of irrelevant attributes, sparse distribution, and complicated calculations in high-dimensional data, traditional clustering algorithms, such as K-means, do not perform well on high-dimensional data. To address the clustering problem of high-dimensional data, this paper studies an integrated clustering method for high-dimensional data. A method of subspace division based on minimum redundancy is proposed to solve the problem of subspace division of high-dimensional data; subspace division is improved by using the K-means algorithm. Additionally, this method uses mutual information between the characteristic variables of the data to replace the calculation in the K-means algorithm. The distance between the characteristic variables of the data is used to divide the data into subspaces according to the mutual information values between the characteristic variables of the data. To achieve high clustering accuracy and diversity based on clustering requirements, this paper uses a genetic algorithm as the consistency integration function. The fitness function is designed according to the clustering fusion target, and the selection operator is designed according to the maximum number of overlapping elements in the base clustering. The experimental results show that the clustering algorithm proposed in this paper outperforms other methods on most datasets and is an effective clustering integration algorithm. The proposed clustering algorithm is compared with other commonly used clustering fusion algorithms on datasets to prove the advantages of the proposed algorithm.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call