Is AI Ground Truth Really True? The Dangers of Training and Evaluating AI Tools Based on Experts’ Know-What

Sarah Lebovitz,Natalia Levina,Hila Lifshitz-Assa

doi:10.25300/misq/2021/16564

Abstract

Organizational decision-makers need to evaluate AI tools in light of increasing claims that such tools out-perform human experts. Yet, measuring the quality of knowledge work is challenging, raising the question of how to evaluate AI performance in such contexts. We investigate this question through a field study of a major U.S. hospital, observing how managers evaluated five different machine-learning (ML) based AI tools. Each tool reported high performance according to standard AI accuracy measures, which were based on ground truth labels provided by qualified experts. Trying these tools out in practice, however, revealed that none of them met expectations. Searching for explanations, managers began confronting the high uncertainty of experts’ know-what knowledge captured in ground truth labels used to train and validate ML models. In practice, experts address this uncertainty by drawing on rich know-how practices, which were not incorporated into these ML-based tools. Discovering the disconnect between AI’s know-what and experts’ know-how enabled managers to better understand the risks and benefits of each tool. This study shows dangers of treating ground truth labels used in ML models objectively when the underlying knowledge is uncertain. We outline implications of our study for developing, training, and evaluating AI for knowledge work.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Is AI Ground Truth Really True? The Dangers of Training and Evaluating AI Tools Based on Experts’ Know-What

Abstract

Talk to us

Similar Papers

More From: MIS Quarterly

Lead the way for us

Journal: MIS Quarterly	Publication Date: Sep 1, 2021
Citations: 104

Similar Papers

Clinical ground truth in machine learning for early sepsis diagnosis
Holger A Lindner ... Verena Schneider-Lindner
The Lancet Digital Health | VOL. 5
Holger A Lindner, et. al.Holger A Lindner ... Verena Schneider-Lindner
24 May 2023
The Lancet Digital Health | VOL. 5

Improving Triage Accuracy in Prehospital Emergency Telemedicine: Scoping Review of Machine Learning-Enhanced Approaches.
Daniel Raff ... Kendall Ho
Interactive journal of medical research | VOL. 13
Daniel Raff, et. al.Daniel Raff ... Kendall Ho
11 Sep 2024
Interactive journal of medical research | VOL. 13

NAPS Fusion: A framework to overcome experimental data limitations to predict human performance and cognitive task outcomes
Nicholas J Napoli ... Angela R Harrivel
Information Fusion | VOL. 91
Nicholas J Napoli, et. al.Nicholas J Napoli ... Angela R Harrivel
27 Sep 2022
Information Fusion | VOL. 91

Does Artificial Intelligence Outperform Natural Intelligence in Interpreting Musculoskeletal Radiological Studies? A Systematic Review.
Olivier Q Groot ... Michiel E R Bongers
Clinical Orthopaedics & Related Research | VOL. 478
Olivier Q Groot, et. al.Olivier Q Groot ... Michiel E R Bongers
30 Jul 2020
Clinical Orthopaedics & Related Research | VOL. 478

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Is AI Ground Truth Really True? The Dangers of Training and Evaluating AI Tools Based on Experts’ Know-What

Abstract

Talk to us

Similar Papers

More From: MIS Quarterly