Active Learning for Data Quality Control: A Survey

Na Li,Chaoran Li,Yiyang Qi,Zhiming Zhao

doi:10.1145/3663369

Abstract

Data quality plays a vital role in scientific research and decision-making across industries. Thus, it is crucial to incorporate the data quality control (DQC) process, which comprises various actions and operations to detect and correct data errors. The increasing adoption of machine learning (ML) techniques in different domains has raised concerns about data quality in the ML field. Conversely, ML’s capability to uncover complex patterns makes it suitable for addressing challenges involved in the DQC process. However, supervised learning methods demand abundant labeled data, while unsupervised learning methods heavily rely on the underlying distribution of the data. Active learning (AL) provides a promising solution by proactively selecting data points for inspection, thus reducing the burden of data labeling for domain experts. Therefore, this survey focuses on applying AL to DQC. Starting with a review of common data quality issues and solutions in the ML field, we aim to enhance the understanding of current quality assessment methods. We then present two scenarios to illustrate the adoption of AL into the DQC systems on the anomaly detection task, including pool-based and stream-based approaches. Finally, we provide the remaining challenges and research opportunities in this field.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Data and Information Quality	Publication Date: Jun 24, 2024
Citations: 1	License type: mit

R Discovery Prime

R Discovery Prime

Active Learning for Data Quality Control: A Survey

Abstract

Talk to us

Similar Papers

More From: Journal of Data and Information Quality

Lead the way for us

Similar Papers

Modified Thermal Lag Correction of CTD Data from Underwater Gliders
Yanhui Wang ... Hualong Liu
Journal of Coastal Research | VOL. 99
Yanhui Wang, et. al.Yanhui Wang ... Hualong Liu
14 May 2020
Journal of Coastal Research | VOL. 99

<title>Service observing and data quality control: some lessons learned from the Hubble Space Telescope</title>
Anuradha Koratkar ... Stefano Casertano
-
Anuradha Koratkar, et. al.Anuradha Koratkar ... Stefano Casertano
03 Jul 1998
03 Jul 1998

OutlierFlag: A Tool for Scientific Data Quality Control by Outlier Data Flagging
Shuai Huang ... Johannes Lüers
Journal of Open Research Software | VOL. 4
Shuai Huang, et. al.Shuai Huang ... Johannes Lüers
31 May 2016
Journal of Open Research Software | VOL. 4

On the data quality and imbalance in machine learning-based design and manufacturing—A systematic review
Yaoyao Fiona Zhao ... Lijun Sun
Engineering | VOL. -
Yaoyao Fiona Zhao, et. al.Yaoyao Fiona Zhao ... Lijun Sun
01 Jul 2024
Engineering | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Active Learning for Data Quality Control: A Survey

Abstract

Talk to us

Similar Papers

More From: Journal of Data and Information Quality