Evaluating effects of automation reliability and reliability information on trust, dependence and dual-task performance

Qiaoning Zhang,X Jessie Yang,Na Du

doi:10.1177/1541931218621041

Abstract

The use of automated decision aids could reduce human exposure to dangers and enable human workers to perform more challenging tasks. However, automation is problematic when people fail to trust and depend on it appropriately. Existing studies have shown that system design that provides users with likelihood information including automation certainty, reliability, and confidence could facilitate trust- reliability calibration, the correspondence between a person’s trust in the automation and the automation’s capabilities (Lee & Moray, 1994), and improve human–automation task performance (Beller et al., 2013; Wang, Jamieson, & Hollands, 2009; McGuirl & Sarter, 2006). While revealing reliability information has been proposed as a design solution, the concrete effects of such information disclosure still vary (Wang et al., 2009; Fletcher et al., 2017; Walliser et al., 2016). Clear guidelines that would allow display designers to choose the most effective reliability information to facilitate human decision performance and trust calibration do not appear to exist. The present study, therefore, aimed to reconcile existing literature by investigating if and how different methods of calculating reliability information affect their effectiveness at different automation reliability. A human subject experiment was conducted with 60 participants. Each participant performed a compensatory tracking task and a threat detection task simultaneously with the help of an imperfect automated threat detector. The experiment adopted a 2×4 mixed design with two independent variables: automation reliability (68% vs. 90%) as a within- subject factor and reliability information as a between-subjects factor. Reliability information of the automated threat detector was calculated using different methods based on the signal detection theory and conditional probability formula of Bayes’ Theorem (H: hits; CR: correct rejections, FA: false alarms; M: misses): Overall reliability = P (H + CR | H + FA + M + CR). Positive predictive value = P (H | H + FA); negative predictive value = P (CR | CR + M). Hit rate = P (H | H + M), correct rejection rate = P (CR | CR + FA). There was also a control condition where participants were not informed of any reliability information but only told the alerts from the automated threat detector may or may not be correct. The dependent variables of interest were participants’ subjective trust in automation and objective measures of their display-switching behaviors. The results of this study showed that as the automated threat detector became more reliable, participants’ trust in and dependence on the threat detector increased significantly, and their detection performance improved. More importantly, there were significant differences in participants’ trust, dependence and dual-task performance when reliability information was calculated by different methods. Specifically, when overall reliability of the automated threat detector was 90%, revealing positive and negative predictive values of the automation significantly helped participants to calibrate their trust in and dependence on the detector, and led to the shortest reaction time for detection task. However, when overall reliability of the automated threat detector was 68%, positive and negative predictive values didn’t lead to significant difference in participants’ compliance on the detector. In addition, our result demonstrated that the disclosure of hit rate and correct rejection rate or overall reliability didn’t seem to aid human-automation team performance and trust-reliability calibration. An implication of the study is that users should be made aware of system reliability, especially of positive/negative predictive values, to engender appropriate trust in and dependence on the automation. This can be applied to the interface design of automated decision aids. Future studies should examine whether the positive and negative predictive values are still the most effective pieces of information for trust calibration when the criterion of the automated threat detector becomes liberal.

Full Text