Fuzzy c-means (FCM) is a well-known clustering method that has wide applications in statistics, pattern recognition and data mining. However, its performance on large scale and high dimensional data is not satisfactory. In this paper, we propose sparse fuzzy C-means (SFCM) algorithm, which reforms traditional FCM to deal with high dimensional data clustering, based on Witten׳s sparse clustering framework. SFCM embeds feature selection into FCM via sparse weighting and makes model interpretation easier. The experiments and comparisons indicate the method is able to select important features and also increase the efficiency for large-scale clustering problem.
Read full abstract