Background Emails have become an integral part of our daily life and work. Phishing emails are often disguised as trustworthy ones and attempt to obtain sensitive information for malicious reasons (Egelman, Cranor, Hong, 2008;). Anti-phishing tools have been designed to help users detect phishing emails or websites (Egelman, et al., 2008; Yang, Xiong, Chen, Proctor, & Li, 2017). However, like any other types of automation aids, these tools are not perfect. An anti-phishing system can make errors, such as labeling a legitimate email as phishing (i.e., a false alarm) or assuming a phishing email as legitimate (i.e., a miss). Human trust in automation has been widely studied as it affects how the human operator interacts with the automation system, which consequently influences the overall system performance (Dzindolet, Peterson, Pomranky, Pierce, & Beck, 2003; Lee & Moray, 1992; Muir, 1994; Sheridan & Parasuraman, 2006). With interacting with an automation system, the human operator should calibrate his or her trust level to trust a system that is capable but distrust a system that is incapable (i.e., trust calibration; Lee & Moray, 1994; Lee & See, 2004; McGuirl & Sarter, 2006). Among the various system capabilities, automation reliability is one of the most important factors that affect trust, and it is widely accepted that higher reliability levels lead to higher trust levels (Desai et al., 2013; Hoff & Bashir, 2015). How well these capabilities are conveyed to the operator is essential (Lee & See, 2004). There are two general ways of conveying the system capabilities: through an explicit description of the capabilities (i.e., description), or through experiencing the system (i.e., experience). These two ways of conveying information have been studied widely in human decision-making literature (Wulff, Mergenthaler-Canseco, & Hertwig, 2018). Yet, there has not been systematic investigation on these different methods of conveying information in the applied area of human-automation interaction (but see Chen, Mishler, Hu, Li, & Proctor, in press; Mishler et al., 2017). Furthermore, trust and reliance on automation is not only affected by the reliability of the automation, but also by the error types, false alarms and misses (Chancey, Bliss, Yamani, & Handley, 2017; Dixon & Wickens, 2006). False alarms and misses affect human performance in qualitatively different ways, with more serious damage being caused by false-alarmprone automation than by miss-prone automation (Dixon, Wickens, & Chang, 2004). In addition, false-alarm-prone automation reduces compliance (i.e., the operator’s reaction when the automation presents a warning); and miss-prone automation reduces reliance (i.e., the operator’s inaction when the automation remains silent; Chancey et al., 2017). Current Study The goal of the current study was to examine how the methods of conveying system reliability and automation error type affect human decision making and trust in automation. The automation system was a phishing-detection system, which provided recommendations to users as to whether an email was legitimate or phishing. The automation reliability was defined as the percentage of correct recommendations (60% vs. 90%). For each reliability level, there were a false-alarm condition, with all the automation errors being false alarms, and a miss condition, with all the errors being misses. The system reliability was conveyed through description (with an exact percentage described to the user) or experience (with immediate feedback to help the user learn; Barron, & Erev, 2003). A total of 510 participants were recruited and completed the experiment online through Amazon Mechanical Turk. The experimental task consisted of classifying 20 emails as phishing and legitimate, with a phishing-detection system providing recommendations. At the end of the experiment, participants rated their trust in this automated aid system. The measures included a performance measure (the decision accuracy made by the participants), as well as two trust measures (participants’ agreement rate with the phishing-detection system, and their self-reported trust in the system). Our results showed that higher system reliability and feedback increased accuracy significantly, but description or error type alone did not affect accuracy. In terms of the trust measures, false alarms led to lower agreement rates than did misses. With a less reliable system, though, the misses caused a problem of inappropriately higher agreement rates; this problem was reduced when feedback was provided for the unreliable system, indicating a trust-calibration role of feedback. Self-reported trust showed similar result patterns to agreement rates. Performance was improved with higher system reliability, feedback, and explicit description. Design implications of the results included that (1) both feedback and description of the system reliability should be presented in the interface of an automation aid whenever possible, provided that the aid is reliable, and (2) for systems that are unreliable, false alarms are more desirable than misses, if one has to choose between the two.
Read full abstract