Abstract

Cluster analysis, as one of the core methods of data mining, is critical in discovering the natural structure of data to obtain useful information from massive amounts of data. However, many existing clustering algorithms have problems such as poor clustering accuracy and high sensitivity to noise points. These problems are particularly prominent when solving high-dimensional and large-data clustering problems. To overcome these problems, a new feature analysis-based elastic net algorithm with a clustering objective function (FAENC) is proposed in this paper. The new algorithm redefines a cost function based on the goal of clustering, and a new energy function of the clustering elastic net is presented based on the cost function and maximum entropy principle. The proposed model is an unsupervised optimization method. By minimizing the energy function, clustering problems can be solved through self-learning, without manual training or intervention. Additionally, a method for calculating the dispersion degree of the feature attributes is proposed, and the noise attributes can be identified. Each feature attribute is weighted automatically according to the weighting strategy, which can eliminate the influence of noise variables and improve the clustering quality and efficiency. The proposed FAENC algorithm can significantly reduce the impact of the internal structure of the dataset, identify clusters of different sizes, shapes, and densities, and obtain higher clustering quality. Compared with several classical and state-of-the-art clustering methods, FAENC substantially improves the accuracy of clustering results on a large number of synthetic and real-world datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call