In traditional adaptive neighbors graph learning (ANGL)-based clustering, the time complexity is more than O(n2), where n is the number of data points, which is not scalable for large-scale data problems in real applications. Subsequently, ANGL adds a balance regularization to its objective function to avoid the sparse over-fitting problem in the learned similarity graph matrix. Still, the regularization may leads to many weak connections between data points in different clusters. To address these problems, we propose a new fast clustering method, namely, Adaptive Neighbors Graph Learning for Large-Scale Data Clustering using Vector Quantization and Self-Regularization (ANGL-LDC), to perform vector quantization (VQ) on original data and feed the obtained VQ data as the input in the n×n similarity graph matrix learning. Hence, the n×n similarity graph matrix learning problem is simplified to weighted m×m(m≪n) graph learning problem, where m is the number of distinct points and weight is the duplicate times of distinct points in VQ data. Consequently, the time complexity of ANGL-LDC is much lower than that of ANGL. At the same time, we propose a new ANGL objective function with a graph connection self-regularization mechanism, where the ANGL-LDC objective function will get an infinity value if the value of one graph connection is equal to 1. Therefore, ANGL-LDC naturally avoids obtaining the sparse over-fitting problem since we need to minimize the value of ANGL-LDC’s objective function. Experimental results on synthetic and real-world datasets demonstrate the scalability and effectiveness of ANGL-LDC.
Read full abstract