Canopy clustering is an effective method for determining the number of clusters dynamically without requiring a predefined cluster count, making it particularly suitable for large and complex datasets. However, its performance is highly dependent on the manual tuning of threshold parameters T1 and T2, which can be time-consuming and inefficient. This study aims to enhance the Canopy clustering algorithm by automating the optimization of threshold ranges using intelligent optimization algorithms. We propose a novel framework that integrates Simulated Annealing (SA), Particle Swarm Optimization (PSO), and Snake Optimization (SO) to automatically determine the optimal values of T1 and T2. Additionally, to address high-dimensional data complexity, we employ dimensionality reduction techniques such as t-SNE, SNE, and Kernel Principal Component Analysis (KPCA). The silhouette coefficient is utilized as the fitness function to evaluate clustering performance. Comprehensive experiments conducted on the Wine, Iris, and MNIST Subset datasets demonstrate that the proposed optimization-based Canopy clustering framework significantly improves clustering accuracy by up to 21% on the Wine dataset and 19% on the Iris dataset compared to traditional methods. Specifically, on the Wine dataset, the optimized Canopy clustering achieved a silhouette coefficient of 0.63, a 21% improvement over the original 0.52. On the Iris dataset, the optimized method outperformed k-means and manual Canopy clustering with silhouette coefficients of 0.62 versus 0.52 and 0.55, respectively. These results highlight the effectiveness of intelligent optimization algorithms in enhancing clustering adaptability and efficiency.
Read full abstract