A Clinician's Guide to Artificial Intelligence: How to Critically Appraise Machine Learning Studies.

Livia Faes,Xiaoxuan Liu,Dawn A Sim,Pearse A Keane,Konstantinos Balaskas,Dun Jack Fu,Lucas M Bachmann,Alastair K Denniston,Siegfried K Wagner

doi:10.1167/tvst.9.2.7

Abstract

In recent years, there has been considerable interest in the prospect of machine learning models demonstrating expert-level diagnosis in multiple disease contexts. However, there is concern that the excitement around this field may be associated with inadequate scrutiny of methodology and insufficient adoption of scientific good practice in the studies involving artificial intelligence in health care.This article aims to empower clinicians and researchers to critically appraise studies of clinical applications of machine learning, through: (1) introducing basic machine learning concepts and nomenclature; (2) outlining key applicable principles of evidence-based medicine; and (3) highlighting some of the potential pitfalls in the design and reporting of these studies.

Highlights

A Clinician’s Guide to Artificial IntelligenceLivia Faes[1,2], Xiaoxuan Liu[1,3,4,5], Siegfried K
Machine learning (ML), a form of artificial intelligence (AI), has generated considerable excitement in recent years, through a number of prominent publications demonstrating the ability of these ML models to achieve expert-level diagnosis in multiple disease contexts.[1,2,3,4,5,6]The very first AI-based technology approved by the US Food and Drug Administration was an ophthalmic application, IDxDR, an algorithm for screening diabetic retinopathy.[7]
There is concern that the excitement around this field may be associated with inadequate scrutiny of methodology and insufficient adoption of scientific good practice in the studies involving artificial intelligence in health care

Summary

A Clinician’s Guide to Artificial Intelligence

TVST | Special Issue | Vol 9 | No 2 | Article 7 | 2 on deep learning models (an advanced subfield of ML characterized by neural networks).[9]. This commonly used technique is helpful in algorithm training, investigators sometimes replicate the class distribution in the validation test set, which is most likely to ensure optimum model performance, even if it is an unrealistic disease prevalence Reporting results in this way is somewhat unhelpful because it becomes difficult to extrapolate whether the same level of accuracy can be replicated in a real patient cohort. Provided we can achieve the appropriate scientific evaluation and real-world regulation of ML health-related interventions, this exciting tool can fulfil its potential to be a powerful technology for patient benefit and health system improvement

Introduction

Findings