A Pitfall of Learning from User-generated Data: In-Depth Analysis of Subjective Class Problem

Kei Nemoto,Shweta Jain

doi:10.1016/j.procs.2021.05.017

Abstract

Research in the supervised learning algorithms field implicitly assumes that training data is labeled by domain experts or at least semi-professional labelers accessible through crowdsourcing services like Amazon Mechanical Turk. With the advent of the Internet, data has become abundant and a large number of machine learning based systems are being trained with user-generated data, where categorical data is used as labels. However, little work has been done in the area of supervised learning with user-defined labels where users are not necessarily experts and might be unable to provide correct labels to some data or the labels might contain significant human bias. In this article, we propose two types of classes in user-defined labels: subjective class and objective class - showing that the objective classes are as reliable as if they were provided by domain experts, whereas the subjective classes are subject to error and bias. We name this a subjective class problem and propose a Normalized Feature Indicative Score that can be effective in detecting subjective classes in a dataset without querying oracle. This score provides early detection of subjective classes in the data, saving time for data mining practitioners working with data that might contain errors and biases.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Procedia Computer Science	Publication Date: Jan 1, 2021
Citations: 1	License type: cc-by-nc-nd

R Discovery Prime

R Discovery Prime

A Pitfall of Learning from User-generated Data: In-Depth Analysis of Subjective Class Problem

Abstract

Talk to us

Similar Papers

More From: Procedia Computer Science

Lead the way for us

Similar Papers

User identification and community exploration via mining big personal data in online platforms
Jiaquan Zhang
-
Jiaquan ZhangJiaquan Zhang
21 Feb 2022
21 Feb 2022

Robust crowd bias correction via dual knowledge transfer from multiple overlapping sources
Sihong Xie ... Jing Gao
-
Sihong Xie, et. al.Sihong Xie ... Jing Gao
01 Oct 2015
01 Oct 2015

A hybrid approach toward biomedical relation extraction training corpora: combining distant supervision with crowdsourcing.
Diana Sousa ... Francisco M Couto
Database : the journal of biological databases and curation | VOL. 2020
Diana Sousa, et. al.Diana Sousa ... Francisco M Couto
01 Dec 2020
Database : the journal of biological databases and curation | VOL. 2020

Collecting response times using Amazon Mechanical Turk and Adobe Flash
Travis Simcox ... Julie A Fiez
Behavior Research Methods | VOL. 46
Travis Simcox, et. al.Travis Simcox ... Julie A Fiez
14 May 2013
Behavior Research Methods | VOL. 46

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Pitfall of Learning from User-generated Data: In-Depth Analysis of Subjective Class Problem

Abstract

Talk to us

Similar Papers

More From: Procedia Computer Science