Abstract Developing effective clustering and statistical methods for high-dimensional sparse data presents unique challenges compared to traditional low-dimensional data. To address this, a novel approach is proposed, leveraging fuzzy data principles to enhance the clustering and statistical performance of high-dimensional sparse datasets. The method builds upon the fuzzy C-means clustering algorithm, introducing key modifications for better suitability to high-dimensional sparse data. One crucial enhancement involves tackling the local optimization problem by optimizing the initial clustering center, significantly reducing clustering statistical time. Replacing the original Euclidean distance with cosine distance improves the clustering and statistical performance of high-dimensional sparse data. Experimental results have shown that this method has superior clustering statistical performance when the data dimensions are different. When the data dimension is low, and the blocking ratio is 10%, the clustering statistical effect is optimal. When the data dimension is high, and the blocking ratio is 40%, the clustering statistical effect is optimal. This method has higher hit rates and clustering statistical efficiency at different sparsity levels.