Abstract

We argue that artificial networks are explainable and offer a novel theory of interpretability. Two sets of conceptual questions are prominent in theoretical engagements with artificial neural networks, especially in the context of medical artificial intelligence: (1) Are networks explainable, and if so, what does it mean to explain the output of a network? And (2) what does it mean for a network to be interpretable? We argue that accounts of “explanation” tailored specifically to neural networks have ineffectively reinvented the wheel. In response to (1), we show how four familiar accounts of explanation apply to neural networks as they would to any scientific phenomenon. We diagnose the confusion about explaining neural networks within the machine learning literature as an equivocation on “explainability,” “understandability” and “interpretability.” To remedy this, we distinguish between these notions, and answer (2) by offering a theory and typology of interpretation in machine learning. Interpretation is something one does to an explanation with the aim of producing another, more understandable, explanation. As with explanation, there are various concepts and methods involved in interpretation: Total or Partial, Global or Local, and Approximative or Isomorphic. Our account of “interpretability” is consistent with uses in the machine learning literature, in keeping with the philosophy of explanation and understanding, and pays special attention to medical artificial intelligence systems.

Highlights

  • Two sets of conceptual problems have gained prominence in theoretical engagements with artificial neural networks (ANNs)

  • Two sets of conceptual questions are prominent in theoretical engagements with artificial neural networks, especially in the context of medical artificial intelligence: (1) Are networks explainable, and if so, what does it mean to explain the output of a network? And (2) what does it mean for a network to be interpretable? We argue that accounts of “explanation” tailored to neural networks have ineffectively reinvented the wheel

  • In contrast to Paez’s (2019) claim that traditional explanations of ANNs are impossible, we argue that four such accounts—the Deductive Nomological, Inductive Statistical, Causal Mechanical, and New Mechanist models— apply to neural networks, as they would to any scientific phenomenon (Section 3.1)

Read more

Summary

Introduction

Two sets of conceptual problems have gained prominence in theoretical engagements with artificial neural networks (ANNs). For some, there is no problem at all since either we face similar issues when dealing with human decision makers (London 2019; Zerilli et al 2019) or because simpler models may sometimes achieve the same degree of accuracy as more complex algorithms for the same task (Rudin 2019) All these approaches turn on what we mean by explanation and what features make explanations “good” or “fitting” for a given account. Krishnan (2019) argues that pursuing definitions of interpretability, and related terms like explainability and understandability, is misguided because doing so reduces solutions to opacity-related problems to merely finding ways to make ANNs more transparent. Our account of interpretability is consistent with many uses within AI, in keeping with philosophy of explanation and understanding, and provided with special attention to the accuracy-complexity relationship in MAIS

The Indefeasibility of Explanation
Four Kinds of Explanation
The Indefeasability Thesis
The Explanation in the Machine
Medical AI Systems Are Explainable
Separating Explanation and Understanding
What is Interpretation?
Total and Partial Interpretation
Local and Global Interpretation
Interpretation by Approximation or Isomorphism
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call