As misinformation increasingly proliferates on social media platforms, it has become crucial to explore how to best convey automated news credibility assessments to end-users, and foster trust in fact-checking AIs. In this paper, we investigate how model-agnostic, natural language explanations influence trust and reliance on a fact-checking AI. We construct explanations from four Conceptualisation Validations (CVs) - namely consensual, expert, internal (logical), and empirical - which are foundational units of evidence that humans utilise to validate and accept new information. Our results show that providing explanations significantly enhances trust in AI, even in a fact-checking context where influencing pre-existing beliefs is often challenging, with different CVs causing varying degrees of reliance. We find consensual explanations to be the least influential, with expert, internal, and empirical explanations exerting twice as much influence. However, we also find that users could not discern whether the AI directed them towards the truth, highlighting the dual nature of explanations to both guide and potentially mislead. Further, we uncover the presence of automation bias and aversion during collaborative fact-checking, indicating how users' previously established trust in AI can moderate their reliance on AI judgements. We also observe the manifestation of a 'boomerang'/backfire effect often seen in traditional corrections to misinformation, with individuals who perceive AI as biased or untrustworthy doubling down and reinforcing their existing (in)correct beliefs when challenged by the AI. We conclude by presenting nuanced insights into the dynamics of user behaviour during AI-based fact-checking, offering important lessons for social media platforms.
Read full abstract