A Survey of Human‐Centered Evaluations in Human‐Centered Machine Learning

F Sperrle,A Endert,G Guo,D Horng Chau,M El‐Assady,R Borgo,D Keim

doi:10.1111/cgf.14329

F Sperrle, A Endert + Show 5 more

Open Access

https://doi.org/10.1111/cgf.14329

Copy DOI

Abstract

AbstractVisual analytics systems integrate interactive visualizations and machine learning to enable expert users to solve complex analysis tasks. Applications combine techniques from various fields of research and are consequently not trivial to evaluate. The result is a lack of structure and comparability between evaluations. In this survey, we provide a comprehensive overview of evaluations in the field of human‐centered machine learning. We particularly focus on human‐related factors that influence trust, interpretability, and explainability. We analyze the evaluations presented in papers from top conferences and journals in information visualization and human‐computer interaction to provide a systematic review of their setup and findings. From this survey, we distill design dimensions for structured evaluations, identify evaluation gaps, and derive future research opportunities.

Highlights

Recent advances in artificial intelligence (AI) and machine learning (ML), have led to numerous breakthroughs across many application domains
While there is no unified definition of humancentered machine learning (HCML), there is a consensus that HCML considers factors pertaining to human involvement in machine learning pipelines, whether as users or as teachers [FG18]
Endert et al called for a paradigm shift from human in the loop to what they called “the human is the loop” [EHR*14], making a first step from interactive machine learning towards human-centered machine learning

Summary

Introduction

Recent advances in artificial intelligence (AI) and machine learning (ML), have led to numerous breakthroughs across many application domains. Endert et al called for a paradigm shift from human in the loop to what they called “the human is the loop” [EHR*14], making a first step from interactive machine learning towards human-centered machine learning According to their vision, systems should facilitate sensemaking tasks by seamlessly integrating analysis capabilities into existing workflows without disrupting users. YGLR20], Justify [HHC*19; KAY*19; KEV*18; LPH*20; SKB*18; WGZ*19; ZWLC19], Train [CHH*19; ESKC18; HKBE12; LPH*20] Text Data [ARO*17; BAL*15; CVL*18; EKC*20; ESD*19; ESKC18; ESS*18; HKBE12; JSR*19; LLL*19; MCZ*17; MXC*20; SFB*20; SKB*18; SSBC19; SSKE19], Geo [PZDD19], Images [CRH*19; CYL*20; KBJ*20; LLS*18; LSC*18; LSL*17; SSSE20; WGSY19; WGYS18; WGZ*19; XCK*20; XXM*19], Video [GLC*19; KAY*19; SMD*16], Multivariate Data [BHZ*18; BSP20; CD19; CMQ20; CWZ*19; DLW*17; dSBD*12; DSKE20; DVH*19; EKSK18; GZL*20; HHC*19; HOW*19; KAKC18; KAS*20; KEV*18; KPN16; LGM*20; LPH*20; LXL*18; MLMP18; MP13; MQB19; MXLM20; PNKC20; RAL*17; SDMT16; SLC*20; WBL*20; WMJ*19; WSW*18; XMT*20; YGLR20; ZWLC19], N/A Quality. Observed Quality Transparency condition [dSBD*12; HKBE12], motivated [GZL*20; KAKC18; MCZ*17; SKB*18; SLC*20; SSKE19; WGSY19; WGYS18] N/A, measured [DLW*17; ESD*19; PNKC20; PZDD19; RAL*17], study condition [dSBD*12], measured condition [CVL*18; ESKC18; HKBE12; MLMP18; SMD*16], motivated [ESS*18; GZL*20; KAS*20; KEV*18; LPH*20; ZWLC19] N/A, measured [GLC*19], study condition, measured condition [KAS*20], motivated [CRH*19; CWZ*19; DSKE20; EKSK18; ESD*19; ESS*18; GZL*20; MQB19; PNKC20; WMJ*19; WSW*18; XCK*20; ZWLC19] Trustworthiness N/A, measured [CRH*19; CWZ*19; DLW*17; HHC*19; HKBE12; SFB*20; SSBC19], study condition, measured condition [CVL*18], motivated [ESD*19; ESKC18; ESS*18; KAKC18; KAS*20; KBJ*20; LSL*17; MCZ*17; MQB19; WBL*20; WMJ*19; XCK*20] Interpretability N/A, measured [CWZ*19; DLW*17; DSKE20; EKSK18; GLC*19; HHC*19; SFB*20; XCK*20], study condition [ESKC18; KAS*20], measured condition [CVL*18; YGLR20], motivated [BAL*15; BSP20; CHH*19; CRH*19; CYL*20; EKC*20; ESS*18; GZL*20; JSR*19; KAKC18; KEV*18; KPN16; LLL*19; MCZ*17; MQB19; SLC*20; WGSY19; WGYS18; WSW*18; XMT*20; ZWLC19] Controllability N/A, measured [DSKE20; WGSY19], study condition [ESKC18; SFB*20], measured condition [SEH*18], motivated [BSP20; CHH*19; CRH*19; EKC*20; ESD*19; GZL*20; HKBE12; JSR*19; KEV*18; LPH*20; PNKC20; SLC*20; WBL*20; WMJ*19]

Objectives

Results

Conclusion