Automated anomaly detection for categorical data by repurposing a form filling recommender system

Hichem Belgacem,Domenico Bianculli,Lionel Briand,Xiaochen Li

doi:10.1145/3696110

Abstract

Data quality is crucial in modern software systems, like data-driven decision support systems. However, data quality is affected by data anomalies, which represent instances that deviate from most of the data. These anomalies affect the reliability and trustworthiness of software systems, and may propagate and cause more issues. Although many anomaly detection approaches have been proposed, they mainly focus on numerical data. Moreover, the few approaches targeting anomaly detection for categorical data do not yield consistent results across datasets. In this paper, we propose a novel anomaly detection approach for categorical data named LAFF-AD (LAFF-based Anomaly Detection), which takes advantage of the learning ability of a state-of-the-art form filling tool (LAFF) to perform value inference on suspicious data. LAFF-AD runs a variant of LAFF that predicts the possible values of a suspicious categorical field in the suspicious instance. LAFF-AD then compares the output of LAFF to the recorded values in the suspicious instance, and uses a heuristic-based strategy to detect categorical data anomalies. We evaluated LAFF-AD by assessing its effectiveness and efficiency on six datasets. Our experimental results show that LAFF-AD can accurately determine a high range of data anomalies, with recall values between 0.6 and 1 and a precision value of at least 0.808. Furthermore, LAFF-AD is efficient, taking at most 7000 s and 735 ms to perform training and prediction, respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Automated anomaly detection for categorical data by repurposing a form filling recommender system

Abstract

Talk to us

Similar Papers

More From: Journal of Data and Information Quality

Lead the way for us

Similar Papers

Anomaly Detection Methods for Categorical Data
Ayman Taha ... Ali S Hadi
ACM Computing Surveys | VOL. 52
Ayman Taha, et. al.Ayman Taha ... Ali S Hadi
30 May 2019
ACM Computing Surveys | VOL. 52

A Heuristic for an Online Applicability of Anomaly Detection Techniques
Ghassan Al-Falouji ... Sven Tomforde
-
Ghassan Al-Falouji, et. al.Ghassan Al-Falouji ... Sven Tomforde
01 Sep 2022
01 Sep 2022

SHARD: A Framework for Sequential, Hierarchical Anomaly Ranking and Detection
Jason Robinson ... Allison Candido
-
Jason Robinson, et. al.Jason Robinson ... Allison Candido
01 Jan 2012
01 Jan 2012

Unsupervised Anomaly Detection in Data Quality Control
Lex Poon ... Na Li
-
Lex Poon, et. al.Lex Poon ... Na Li
15 Dec 2021
15 Dec 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automated anomaly detection for categorical data by repurposing a form filling recommender system

Abstract

Talk to us

Similar Papers

More From: Journal of Data and Information Quality