Abstract

We propose a weighted outlier mining method called WATCH to identify outliers in high-dimensional categorical datasets. WATCH is composed of two distinctive modules: 1) feature grouping by the virtue of correlation measurement among features and 2) outlier mining by assigning scores to objects in each feature groups. At the heart of WATCH is the feature grouping module, which groups an array of features into multiple groups to discover various aspects of feature patterns in each group. The outlier mining module detects outliers from high-dimensional categorical datasets. Except for the number of outliers specified by users, WATCH is conducive to bypassing the optimization of any user-given parameter. We implement and evaluate WATCH using synthetic and real-world datasets. Our experimental results show that WATCH is a promising and practical algorithm to detect outliers in high-dimensional categorical datasets, because WATCH achieves high performance in terms of precision, efficiency, and interpretability.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call