Abstract
The cluster number can directly affect the clustering effect and its application in real-world scenarios. Its determination is one of the key issues in cluster analysis. According to singular value decomposition (SVD), the characteristic directions of larger singular values likely represent the primary data patterns, trends, or structures corresponding to the main information. In clustering analysis, the main information and structure are likely related to the cluster structure itself. The number of larger singular values may correspond to the number of clusters, and their main information may correspond to different clusters. Based on this, a singular-value-based cluster number detection method is proposed. First, the transferred K-nearest neighbors (TKNN) density formula is proposed to address the limitation of the DPC algorithm in failing to identify centroids in sparse clusters of unbalanced datasets. Second, core data are selected by the DPC algorithm with a modified density formula to better capture the data distribution. Third, based on the selected core data, a sparse similarity matrix is constructed to further highlight the relationships between data and enhance the distribution of data features. Finally, SVD is performed on the sparse similarity matrix to obtain singular values, the cumulative contribution rate is introduced to determine the number of relatively large singular values (i.e., the cluster number). Experimental results show that our method is superior in determining the cluster number for datasets with complex shapes.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have