Building Data Curation Processes with Crowd Intelligence

Tianwa Chen,Shazia Sadiq,Lei Han,Gianluca Demartini,Marta Indulska

doi:10.1007/978-3-030-58135-0_3

Abstract

Data curation processes constitute a number of activities, such as transforming, filtering or de-duplicating data. These processes consume an excessive amount of time in data science projects, due to datasets often being external, re-purposed and generally not ready for analytics. Overall, data curation processes are difficult to automate and require human input, which results in a lack of repeatability and potential errors propagating into analytical results. In this paper, we explore a crowd intelligence-based approach to building robust data curation processes. We study how data workers engage with data curation activities, specifically related to data quality detection, and how to build a robust and effective data curation process by learning from the wisdom of the crowd. With the help of a purpose-designed data curation platform based on iPython Notebook, we conducted a lab experiment with data workers and collected a multi-modal dataset that includes measures of task performance and behaviour data. Our findings identify avenues by which effective data curation processes can be built through crowd intelligence.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Building Data Curation Processes with Crowd Intelligence

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

A study on formalizing the knowledge of data curation activities across different fields.
Yasuyuki Minamiyama ... Makoto Asaoka
PLOS ONE | VOL. 19
Yasuyuki Minamiyama, et. al.Yasuyuki Minamiyama ... Makoto Asaoka
25 Apr 2024
PLOS ONE | VOL. 19

Creating a Medication Therapy Observational Research Database from an Electronic Medical Record: Challenges and Data Curation.
Hans-Ulrich Prokosch ... Wolfgang Rödle
Applied Clinical Informatics | VOL. 15
Hans-Ulrich Prokosch, et. al.Hans-Ulrich Prokosch ... Wolfgang Rödle
01 Jan 2024
Applied Clinical Informatics | VOL. 15

Researchers May Need Additional Data Curation Support
Robin E Miller
Evidence Based Library and Information Practice | VOL. 14
Robin E MillerRobin E Miller
14 Mar 2019
Evidence Based Library and Information Practice | VOL. 14

On Automating Basic Data Curation Tasks
Seyed-Mehdi-Reza Beheshti ... Reza Nouri
-
Seyed-Mehdi-Reza Beheshti, et. al.Seyed-Mehdi-Reza Beheshti ... Reza Nouri
01 Jan 2017
01 Jan 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Building Data Curation Processes with Crowd Intelligence

Abstract

Talk to us

Similar Papers