Abstract

High-dimensional data clustering is gaining attention in recent years due to its widespread applications in many domains like social networking, biology, etc. As a result of the advances in the data gathering and data storage technologies, many a times a single data object is often represented by many attributes. Although more data may provide new insights, it may also hinder the knowledge discovery process by cluttering the interesting relations with redundant information. The traditional definition of similarity becomes meaningless in high-dimensional data. Hence, clustering methods based on similarity between objects fail to cope with increased dimensionality of data. A dataset with large dimensionality can be better described in its subspaces than as a whole. Subspace clustering algorithms identify clusters existing in multiple, overlapping subspaces. Subspace clustering methods are further classified as top-down and bottom-up algorithms depending on strategy applied to identify subspaces. Initial clustering in case of top-down algorithms is based on full set of dimensions and it then iterates to identify subset of dimensions which can better represent the subspaces by removing irrelevant dimensions. Bottom-up algorithms start with low dimensional space and merge dense regions by using Apriori-based hierarchical clustering methods. It has been observed that, the performance and quality of results of a subspace clustering algorithm is highly dependent on the parameter values input to the algorithm. This paper gives an overview of work done in the field of subspace clustering.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call