Abstract
We consider the problem of clustering observations described by a large set of features. The full set of features may not be relevant to determine the true underlying clusters present in the data which may differ only with respect to a small number of features. We propose Sparse subspace K-means (SSKM) a new subspace clustering method that performs simultaneously a clustering of the observations and the selection of relevant features for each cluster. The method is based on a single criterion with a lasso-type penalty for both the relevant features selection and the clustering as in the sparse K-means. The proposed method associates a set of relevant features to each cluster rather than to the whole partition as it is done in sparse K-means. The method is demonstrated on simulated and real data. In comparison with K-means, Sparse K-means and Entropy Weighting K-Means (EWKM) the SSKM method has better performances both in terms of partition quality indices and detection of relevant features.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have