Abstract

Finding clusters of data objects in high dimensional space is challenging, especially considering that such data can be sparse and highly skewed. This paper focuses on using Concept Lattice to solve high dimensional sparse data clustering problem. Concept Lattice Theory is an effective tool for data analysis and knowledge processing, which integrates the concept intent (attribute) and concept extent (object), and describes the hierarchical relationship of concept nodes. The construction of concept lattice itself is a process of concept clustering, but it produces a huge number of concept nodes due to its own completeness. Whereas we are not interested in the concept nodes whose extent is too large or too small. This paper proposes an effective high dimensional sparse data Clustering Algorithm Based On Concept Feature Vector (CABOCFV), which reduces the redundancy of concept construction using `Concept Sparse Feature Distance' and `Concept Feature Vector' and raises an effective noise recognition strategy. CABOCFV clustering algorithm is not susceptible to the input order of data objects, and scans the database only once. Experiments show that CABOCFV is effective and efficient for high dimensional sparse data clustering.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call