Abstract

Subspace outlier mining has a very important significance in big data analysis. To a large extent, subspace clustering algorithm has impact on the efficiency of mining outliers in subspaces. To solve the problem that CMI method selects best clustering subspaces unstably and complexly, formulas of chain rule of Cumulative Entropy, Cumulative Total Correlation and Cumulative Holoentropy were given. Cumulative Holoentropy was used to mine the best clustering subspaces on continuous data sets in which outliers were detected. Subspace outlier detection algorithm based on Cumulative Holoentropy was then proposed. Finally, the validity and scalability of proposed method were tested on real datasets and virtual datasets. Experiment shows that the efficiency of mining outliers in subspaces is enhanced by the proposed algorithm.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call