Abstract

The clustering effect of the spectral clustering algorithm depends on the calculation of the similarity between samples. Although a better clustering effect of the spectral clustering algorithm can be obtained using the Gaussian kernel function to calculate the similarity between samples, it relies on the setting of the kernel parameter. Therefore, an adaptive density-sensitive similarity measure based spectral clustering (DSSC) algorithm is proposed for improving the clustering effect. Specifically, firstly, the Euclidean distances between samples are calculated to get the nearest neighbors of each sample. Secondly, the standard deviation of distances between each sample and its nearest neighbors is calculated as the density parameter. Thirdly, the density-sensitive distances between each sample and its nearest neighbors are calculated. Finally, the similarities between each sample and its nearest neighbors are calculated to construct a similarity matrix. In addition, the proposed DSSC algorithm is parallelized on Dask distributed parallel computing platform with CPU+GPU, which can improve the computational efficiency of the DSSC algorithm by taking full advantage of the CPU and GPU resources. A series of experiments are conducted to verify the effectiveness of the proposed DSSC algorithm on several synthetic datasets and UCI datasets, and the results show that the DSSC algorithm not only achieves satisfactory clustering results, but also obtains better efficiency of performing large-scale clustering analysis.

Highlights

  • The clustering algorithm [1] is one of the unsupervised learning algorithms commonly used for data mining, and its purpose is to divide the samples of the same class into the same cluster as many as possible

  • Yang et al [15] proposed a spectral clustering algorithm based on density sensitive similarity, which uses an adjustable line segment length measure method to calculate the distances between samples to construct a similarity matrix, and a random matrix is constructed based on the Markov chain

  • An adaptive density-sensitive similarity measure based spectral clustering algorithm is proposed, which can better calculate the similarities between samples and their nearest neighbors to improve the clustering effect to a certain extent

Read more

Summary

INTRODUCTION

The clustering algorithm [1] is one of the unsupervised learning algorithms commonly used for data mining, and its purpose is to divide the samples of the same class into the same cluster as many as possible. Zhang et al [14] proposed a spectral clustering algorithm based on local density adaptive similarity, which adopts the common near neighbor measure method to construct a similarity matrix. Yang et al [15] proposed a spectral clustering algorithm based on density sensitive similarity, which uses an adjustable line segment length measure method to calculate the distances between samples to construct a similarity matrix, and a random matrix is constructed based on the Markov chain. The above research can effectively reduce the running time of the spectral clustering algorithm, how to fully utilize all available computing resources of a cluster to improve the efficiency of performing large-scale clustering analysis is still a challenge. An adaptive density-sensitive similarity measure based spectral clustering algorithm is proposed, which can better calculate the similarities between samples and their nearest neighbors to improve the clustering effect to a certain extent.

OVERVIEW OF NJW ALGORITHM
THE PROPOSED DSSC ALGORITHM
PARALLELIZATION OF THE DSSC ALGORITHM
ANALYSIS OF TIME COMPLEXITY
ANALYSIS OF COMPUTATIONAL EFFICIENCY
Worker Nodes
Findings
CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.