Assessing the difficulty of annotating medical data in crowdworking with help of experiments.

Anne Rother,Tommy Hielscher,Uli Niemann,Myra Spiliopoulou,Henry Völzke,Till Ittermann,Alberto Fernández-Hilario

doi:10.1371/journal.pone.0254764

Anne Rother, Tommy Hielscher + Show 5 more

Open Access

https://doi.org/10.1371/journal.pone.0254764

Copy DOI

Abstract

BackgroundAs healthcare-related data proliferate, there is need to annotate them expertly for the purposes of personalized medicine. Crowdworking is an alternative to expensive expert labour. Annotation corresponds to diagnosis, so comparing unlabeled records to labeled ones seems more appropriate for crowdworkers without medical expertise. We modeled the comparison of a record to two other records as a triplet annotation task, and we conducted an experiment to investigate to what extend sensor-measured stress, task duration, uncertainty of the annotators and agreement among the annotators could predict annotation correctness.Materials and methodsWe conducted an annotation experiment on health data from a population-based study. The triplet annotation task was to decide whether an individual was more similar to a healthy one or to one with a given disorder. We used hepatic steatosis as example disorder, and described the individuals with 10 pre-selected characteristics related to this disorder. We recorded task duration, electro-dermal activity as stress indicator, and uncertainty as stated by the experiment participants (n = 29 non-experts and three experts) for 30 triplets. We built an Artificial Similarity-Based Annotator (ASBA) and compared its correctness and uncertainty to that of the experiment participants.ResultsWe found no correlation between correctness and either of stated uncertainty, stress and task duration. Annotator agreement has not been predictive either. Notably, for some tasks, annotators agreed unanimously on an incorrect annotation. When controlling for Triplet ID, we identified significant correlations, indicating that correctness, stress levels and annotation duration depend on the task itself. Average correctness among the experiment participants was slightly lower than achieved by ASBA. Triplet annotation turned to be similarly difficult for experts as for non-experts.ConclusionOur lab experiment indicates that the task of triplet annotation must be prepared cautiously if delegated to crowdworkers. Neither certainty nor agreement among annotators should be assumed to imply correct annotation, because annotators may misjudge difficult tasks as easy and agree on incorrect annotations. Further research is needed to improve visualizations for complex tasks, to judiciously decide how much information to provide, Out-of-the-lab experiments in crowdworker setting are needed to identify appropriate designs of a human-annotation task, and to assess under what circumstances non-human annotation should be preferred.

Highlights

Crowdsourcing is an approach where the wisdom of the crowd is used to solve a specific problem [1]
We modeled the comparison of a record to two other records as a triplet annotation task, and we conducted an experiment to investigate to what extend sensor-measured stress, task duration, uncertainty of the annotators and agreement among the annotators could predict annotation correctness
When controlling for Triplet ID, we identified significant correlations, indicating that correctness, stress levels and annotation duration depend on the task itself

Summary

Introduction

Crowdsourcing is an approach where the wisdom of the crowd is used to solve a specific problem [1] Crowdsourcing tasks and their annotation are becoming popular in health and medical research [2, 3]. [14], there are (to the best of our knowledge) no investigations on how crowdworkers would perform in the triplet-based annotation task in the medical context. This implies that models of crowdworkers, as proposed e.g. in [15], cannot be used, since there is no a priori knowledge on the task complexity ‘from a purely objective standpoint’, i.e. from ‘the characteristics of the task alone’ (quoting from [15], preample of section 3.1). We modeled the comparison of a record to two other records as a triplet annotation task, and we conducted an experiment to investigate to what extend sensor-measured stress, task duration, uncertainty of the annotators and agreement among the annotators could predict annotation correctness

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PloS one	Publication Date: Jul 29, 2021
Citations: 4	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Assessing the difficulty of annotating medical data in crowdworking with help of experiments.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one

Lead the way for us

Similar Papers

Feasibility of using a biowatch to monitor GSR as a measure of radiologists' stress and fatigue
Bruce I Reiner ... Lea Mackinnon
-
Bruce I Reiner, et. al.Bruce I Reiner ... Lea Mackinnon
17 Mar 2015
17 Mar 2015

Why Big Data Won't Cure Us.
Gina Neff
Big data | VOL. 1
Gina NeffGina Neff
01 Sep 2013
Big data | VOL. 1

Examining the Impact of Uncontrolled Variables on Physiological Signals in User Studies for Information Processing Activities
Kaixin Ji ... Falk Scholer
-
Kaixin Ji, et. al.Kaixin Ji ... Falk Scholer
18 Jul 2023
18 Jul 2023

Optimizing Performance through Stress and Induction Levels in Virtual Reality Using Autonomic Responses
Dan Archer ... Anthony Steed
-
Dan Archer, et. al.Dan Archer ... Anthony Steed
01 Oct 2022
01 Oct 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Assessing the difficulty of annotating medical data in crowdworking with help of experiments.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one