Parallel mining of contextual outlier using sparse subspace

Xujun Zhao,Jifu Zhang,Xiao Qin,Jianghui Cai,Yang Ma

doi:10.1016/j.eswa.2019.02.020

Abstract

In this paper, we present a parallel computing solution for an existing outlier mining method (LOMA)–a local outlier detection technique. As datasets increase radically in size, there is a pressing demand to develop highly scalable outlier detection algorithms that leverages modern distributed and parallel computing infrastructures. It is challenging to devise parallel outlier detection techniques, because sharing and accessing global data across multiple computing nodes inevitably impose I/O overheads. To address this concern, we design a parallel LOMA computing framework called PLOMA, which is composed of three modules, namely, parallel data reduction, local sparse-subspace construction, and sparse-subspace validation. The parallel data reduction module exhibits superb efficiency by the virtue of the sampling technology to compute k nearest neighbor. At the heart of PLOMA is a local sparse-subspace construction module, which extends the discrete particle swarm optimization method to search local sparse subspaces on multiple datanodes in parallel. The validation module allows PLOMA to maintain high mining accuracy by verifying the correctness of the local sparse subspaces obtained by the construction module. Furthermore, PLOMA produces contextual information serving as insightful explanations on detected outliers, thereby achieving high interpretability. We apply PLOMA algorithm to the astronomical spectral data for discovering peculiar celestial objects. Our comprehensive experimental study using a 24-nodes Hadoop cluster confirms that PLOMA achieves high performance in terms of extensibility, scalability, and interpretability.

Full Text