Abstract

Clustering is one of the most important topics in data mining and machine learning. The density peaks clustering (DPC) algorithm is a well-known density-based clustering method that can efficiently and effectively deal with non-spherical clusters. However, the computational methods of the local density and the distance measure are simple and easily ignore the correlation and the similarity between samples, and the manual setting of parameters has a great influence on the clustering results; therefore, the clustering performance of DPC is poor on the high-dimensional datasets. To address these issues, this paper presents an adaptive DPC algorithm with Fisher linear discriminant for the clustering of complex datasets, called ADPC-FLD. First, the kernel density estimation function is introduced to calculate the local density of the sample points. Pearson correlation coefficient between samples as weight is employed to construct a weighted Euclidean distance function to measure the distance between samples. This considers both the spatial structure and the correlation of the samples. Then, a novel density estimation entropy is proposed, and based on the minimization of density estimation entropy, the density estimation parameters are adaptively selected according to the distribution characteristics of the data, which can efficiently eliminate the influence of manual setting. Third, an adaptive strategy of cluster center selection is designed to avoid the error caused by the noise data as the cluster centers and the uncertainty of manually selecting the cluster centers. Finally, Fisher linear discriminant algorithm is used to eliminate the irrelevant information and reduce the dimensionality of high-dimensional data, following on which an adaptive DPC method is implemented on six synthetic datasets, thirteen UCI datasets and seven gene expression datasets for comparing with other related algorithms. The experimental results on 26 datasets show that the proposed algorithm significantly outperforms several outstanding clustering approaches in terms of clustering accuracy and efficiency.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.