Tabular Data Anomaly Patterns

Dina Sukhobok,Nikolay Nikolov,Dumitru Roman

doi:10.1109/innovate-data.2017.10

Abstract

One essential and challenging task in data science is data cleaning - the process of identifying and eliminating data anomalies. Different data types, data domains, data acquisition methods, and final purposes of data cleaning have resulted in different approaches in defining data anomalies in the literature. This paper proposes and describes a set of basic data anomalies in the form of anomaly patterns commonly encountered in tabular data, independently of the data domain, data acquisition technique, or the purpose of data cleaning. This set of anomalies can serve as a valuable basis for developing and enhancing software products that provide general-purpose data cleaning facilities and can provide a basis for comparing different tools aimed to support tabular data cleaning capabilities. Furthermore, this paper introduces a set of corresponding data operations suitable for addressing the identified anomaly patterns and introduces Grafterizer - a software framework that implements those data operations.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Tabular Data Anomaly Patterns

Abstract

Talk to us

Similar Papers

Lead the way for us

Publication Date: Aug 1, 2017
Citations: 34	License type: other-oa

Similar Papers

Data Preprocessing Toolkit : An Approach to Automate Data Preprocessing
Deepak Varma ... P Swathy
INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT | VOL. 07
Deepak Varma, et. al.Deepak Varma ... P Swathy
23 Mar 2023
INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT | VOL. 07

A Data Cleaning Model for Electric Power Big Data Based on Spark Framework
Zhao-Yang Qu ... Chong Wang
International Journal of Database Theory and Application | VOL. 9
Zhao-Yang Qu, et. al.Zhao-Yang Qu ... Chong Wang
31 Mar 2016
International Journal of Database Theory and Application | VOL. 9

Fruit Trees 3D Data Acquisition and Reconstruction Based on Multi-source
Sheng Wu ... Xinyu Guo
-
Sheng Wu, et. al.Sheng Wu ... Xinyu Guo
01 Jan 2019
01 Jan 2019

Belief based data cleaning for wireless sensor networks
Bakhtiar Qutub Ali ... Niki Pissinou
Wireless Communications and Mobile Computing | VOL. 12
Bakhtiar Qutub Ali, et. al.Bakhtiar Qutub Ali ... Niki Pissinou
06 Mar 2012
Wireless Communications and Mobile Computing | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Tabular Data Anomaly Patterns

Abstract

Talk to us

Similar Papers