Parameter-Free Anomaly Detection for Categorical Data

Shu Wu,Shengrui Wang

doi:10.1007/978-3-642-23199-5_9

Abstract

Outlier detection can usually be considered as a preprocessing step for locating, from a data set, the objects that do not conform to well defined notions of expected behaviors. It is a major issue of data mining for discovering novel or rare events, actions and phenomena. We investigate outlier detection from a categorical data set. The problem is especially challenging because of difficulty in defining a meaningful similarity measure for categorical data. In this paper, we propose a formal definition of outliers and formulize outlier detection as an optimization problem. To solve the optimization problem, we design a practical and parameter-free method, named ITB. Experimental results show that the ITB method is much more effective and efficient than existing main-stream methods.

Full Text