Abstract

In this paper, we introduce a new approach of semisupervised anomaly detection that deals with categorical data. Given a training set of instances (all belonging to the normal class), we analyze the relationship among features for the extraction of a discriminative characterization of the anomalous instances. Our key idea is to build a model that characterizes the features of the normal instances and then use a set of distance-based techniques for the discrimination between the normal and the anomalous instances. We compare our approach with the state-of-the-art methods for semisupervised anomaly detection. We empirically show that a specifically designed technique for the management of the categorical data outperforms the general-purpose approaches. We also show that, in contrast with other approaches that are opaque because their decision cannot be easily understood, our proposed approach produces a discriminative model that can be easily interpreted and used for the exploration of the data.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.