Cleanix

Hongzhi Wang,Yingyi Bu,Mingda Li,Jiacheng Zhang,Hong Gao,Jianzhong Li

doi:10.1145/2935694.2935702

Cleanix

Hongzhi Wang, Yingyi Bu + Show 4 more

https://doi.org/10.1145/2935694.2935702

Copy DOI

Journal: ACM SIGMOD Record	Publication Date: May 9, 2016
Citations: 30

Affiliation: Harbin Institute of Technology, University of California, Irvine, Tsinghua University

#Data Cleaning #Abnormal Value Detection + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

For big data, data quality problem is more serious. Big data cleaning system requires scalability and the abilityof handling mixed errors. Motivated by this, we develop Cleanix, a prototype system for cleaning relational Big Data. Cleanix takes data integrated from multiple data sources and cleans them on a shared-nothing machine cluster. The backend system is built on-top-of an extensible and flexible data-parallel substrate the Hyracks framework. Cleanix supports various data cleaning tasks such as abnormal value detection and correction, incomplete data filling, de-duplication, and conflict resolution. In this paper, we show the organization, data cleaning algorithms as well as the design of Cleanix.

Full Text