Abstract

We consider the problem of clustering observations described by a large set of features. The full set of features may not be relevant to determine the true underlying clusters present in the data which may differ only with respect to a small number of features. We propose Sparse subspace K-means (SSKM) a new subspace clustering method that performs simultaneously a clustering of the observations and the selection of relevant features for each cluster. The method is based on a single criterion with a lasso-type penalty for both the relevant features selection and the clustering as in the sparse K-means. The proposed method associates a set of relevant features to each cluster rather than to the whole partition as it is done in sparse K-means. The method is demonstrated on simulated and real data. In comparison with K-means, Sparse K-means and Entropy Weighting K-Means (EWKM) the SSKM method has better performances both in terms of partition quality indices and detection of relevant features.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call