Scalable Mining of Contextual Outliers Using Relevant Subspace

Jifu Zhang,Xiao Qin,Yaling Xun,Xiaolong Yu,Sulan Zhang

doi:10.1109/tsmc.2017.2718592

Abstract

In this paper, we propose a scalable mining algorithm to discover contextual outliers using relevant subspaces. We develop the mining algorithm using the MapReduce programming model running on a Hadoop cluster. Relevant subspaces, which effectively capture the local distribution of various datasets, are quantified using local sparseness of attribute dimensions. We design a novel way of calculating local outlier factors in a relevant subspace with the probability density of local datasets; this new approach can effectively reflect the outlier degree of a data object that does not satisfy the distribution of the local dataset in the relevant subspace. Attribute dimensions of a relevant subspace, and local outlier factors are expressed as vital contextual information, which improves the interpretability of outliers. Importantly, the selection of ${N}$ data objects with the largest local outlier factor value is categorized as contextual outliers in our solution. To this end, our scalable mining algorithm, which incorporates the locality sensitive hashing distributed strategy, is implemented on a Hadoop cluster. The experimental results validate the effectiveness, interpretability, scalability, and extensibility of the algorithm using both synthetic data and stellar spectral data as experimental datasets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Scalable Mining of Contextual Outliers Using Relevant Subspace

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Systems, Man, and Cybernetics: Systems

Lead the way for us

Journal: IEEE Transactions on Systems, Man, and Cybernetics: Systems	Publication Date: Mar 1, 2020
Citations: 42

Similar Papers

A relevant subspace based contextual outlier mining algorithm
Jifu Zhang ... Xiao Qin
Knowledge-Based Systems | VOL. 99
Jifu Zhang, et. al.Jifu Zhang ... Xiao Qin
23 Feb 2016
Knowledge-Based Systems | VOL. 99

A new density-based subspace selection method using mutual information for high dimensional outlier detection
Mahboobeh Riahi-Madvar ... Bijan Raahemi
Knowledge-Based Systems | VOL. 216
Mahboobeh Riahi-Madvar, et. al.Mahboobeh Riahi-Madvar ... Bijan Raahemi
02 Jan 2020
Knowledge-Based Systems | VOL. 216

Local outlier factor as part of a workflow for detecting and attenuating blending noise in simultaneously acquired data
Woodon Jeong ... Mohammed S Almubarak
Geophysical Prospecting | VOL. 68
Woodon Jeong, et. al.Woodon Jeong ... Mohammed S Almubarak
21 Apr 2020
Geophysical Prospecting | VOL. 68

GMBLOF: A Machine Learning Algorithm of Novelty Detection Based on Local Outlier Factor
Xing Yang ... Pan Huang
-
Xing Yang, et. al.Xing Yang ... Pan Huang
19 Aug 2022
19 Aug 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Scalable Mining of Contextual Outliers Using Relevant Subspace

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Systems, Man, and Cybernetics: Systems