Abstract
Outlier detection is a key data analysis technique that aims to find unusual data objects in a data set. It has been widely used in varied areas, including communication networks, finance, medicine, environmental studies, etc. Many applications in these areas involve categorical data. For example, the data set used in the application of intrusion detection normally includes a group of captured packets, which tend to have categorical attributes such as “protocol”. Although there are many outlier detection algorithms for applications involving numerical data, only a few existing schemes can handle categorical data. And the schemes designed for categorical data seriously suffer from two problems: low detection precision and high time complexity. In this paper, we present two novel outlier detection algorithms for categorical data sets. First of all, we describe a simple scheme based on entropy, Outlier Detection Tree (ODT). With ODT, a classification tree is constructed to classify the data set into two classes: a normal class and an abnormal class. Thereafter, each data object is identified as an outlier or a normal one using the if-then rules in the tree. Furthermore, we propose an advanced outlier detection algorithm, FAST-ODT, which achieves both high detection accuracy and low time complexity. Our experimental results indicate that FAST-ODT outperforms the existing algorithms in terms of outlier detection precision and computational complexity.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE Transactions on Network Science and Engineering
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.