A data-driven preprocessing scheme on anomaly detection in big data applications

Shengjie Xu,Yi Qian,Rose Qingyang Hu

doi:10.1109/infcomw.2017.8116481

Abstract

Efficient anomaly detection mechanisms are becoming an urgent and critical topic in the presence of big data applications. In this paper, we propose a data-driven preprocessing scheme on anomaly detection that incorporates a dimensionality reduction algorithm and present a real-time learning idea for big data applications. Specifically, we make extensive use of the robust data preprocessing and a real-time data learning approach. The proposed robust data preprocessing scheme not only preserves the critical property of dimensionality reduction for high-dimensional data, but also introduces a robust detection boundary to the presence of outliers. The real-time learning method is inspired by online learning, which differs from batch based data processing that performs data learning on an entire batch of data set. Real-time learning aims to make progress with each example it looks at. Detailed discussions are provided for the justification of this scheme. A case study is presented to demonstrate the feasibility of the application of the proposed scheme.

Full Text