Effect of Label Redundancy in Crowdsourcing for Training Machine Learning Models

Ayame Shimizu,Kei Wakabayash

doi:10.26421/jdi3.3-1

Abstract

Crowdsourcing is widely utilized for collecting labeled examples to train supervised machine learning models, but the labels obtained from workers are considerably noisier than those from expert annotators. To address the noisy label issue, most researchers adopt the repeated labeling strategy, where multiple (redundant) labels are collected for each example and then aggregated. Although this improves the annotation quality, it decreases the amount of training data when the budget for crowdsourcing is limited, which is a negative factor in terms of the accuracy of the machine learning model to be trained. This paper empirically examines the extent to which repeated labeling contributes to the accuracy of machine learning models for image classification, named entity recognition and sentiment analysis under various conditions of budget and worker quality. We experimentally examined four hypotheses related to the effect of budget, worker quality, task difficulty, and redundancy on crowdsourcing. The results on image classification and named entity recognition supported all four hypotheses and suggested that repeated labeling almost always has a negative impact on machine learning when it comes to accuracy. Somewhat surprisingly, the results on sentiment analysis using pretrained models did not support the hypothesis which shows the possibility of remaining utilization of multiple-labeling.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Effect of Label Redundancy in Crowdsourcing for Training Machine Learning Models

Abstract

Talk to us

Similar Papers

More From: Journal of Data Intelligence

Lead the way for us

Journal: Journal of Data Intelligence	Publication Date: Aug 1, 2022
Citations: 3

Similar Papers

Accuracy of machine learning models using ultrasound images in prostate cancer diagnosis: a systematic review
Retta Catherina Sihotang ... Agus Rizal Ardy Hariandy Hamid
Medical Journal of Indonesia | VOL. 32
Retta Catherina Sihotang, et. al.Retta Catherina Sihotang ... Agus Rizal Ardy Hariandy Hamid
20 Oct 2023
Medical Journal of Indonesia | VOL. 32

A Meta-analysis of Predicting Disorders of Consciousness After Traumatic Brain Injury by Machine Learning Models.
Xi Zhu ... Li Gao
Alpha psychiatry | VOL. 25
Xi Zhu, et. al.Xi Zhu ... Li Gao
01 Jun 2024
Alpha psychiatry | VOL. 25

A Comparison of Machine Learning Algorithms for Rate of Penetration Prediction for Directional Wells
Peng Cheng ... Chris Cheng
-
Peng Cheng, et. al.Peng Cheng ... Chris Cheng
07 Mar 2023
07 Mar 2023

Robust Transparency Against Model Inversion Attacks.
Yasmeen Alufaisan ... Yan Zhou
IEEE Transactions on Dependable and Secure Computing | VOL. 18
Yasmeen Alufaisan, et. al.Yasmeen Alufaisan ... Yan Zhou
01 Jan 2020
IEEE Transactions on Dependable and Secure Computing | VOL. 18

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Effect of Label Redundancy in Crowdsourcing for Training Machine Learning Models

Abstract

Talk to us

Similar Papers

More From: Journal of Data Intelligence