The need to approximate the use-case in clinical machine learning.

Sohrab Saeb,Luca Lonini,Arun Jayaraman,Konrad P Kording,David C Mohr

doi:10.1093/gigascience/gix019

Sohrab Saeb, Luca Lonini + Show 3 more

Open Access

https://doi.org/10.1093/gigascience/gix019

Copy DOI

Journal: GigaScience	Publication Date: Mar 15, 2017
Citations: 193	License type: CC BY 4.0

Affiliation: Northwestern University, Shirley Ryan AbilityLab

Abstract

The availability of smartphone and wearable sensor technology is leading to a rapid accumulation of human subject data, and machine learning is emerging as a technique to map those data into clinical predictions. As machine learning algorithms are increasingly used to support clinical decision making, it is vital to reliably quantify their prediction accuracy. Cross-validation (CV) is the standard approach where the accuracy of such algorithms is evaluated on part of the data the algorithm has not seen during training. However, for this procedure to be meaningful, the relationship between the training and the validation set should mimic the relationship between the training set and the dataset expected for the clinical use. Here we compared two popular CV methods: record-wise and subject-wise. While the subject-wise method mirrors the clinically relevant use-case scenario of diagnosis in newly recruited subjects, the record-wise strategy has no such interpretation. Using both a publicly available dataset and a simulation, we found that record-wise CV often massively overestimates the prediction accuracy of the algorithms. We also conducted a systematic review of the relevant literature, and found that this overly optimistic method was used by almost half of the retrieved studies that used accelerometers, wearable sensors, or smartphones to predict clinical outcomes. As we move towards an era of machine learning-based diagnosis and treatment, using proper methods to evaluate their accuracy is crucial, as inaccurate results can mislead both clinicians and data scientists.

Highlights

Machine learning has evolved as the branch of artificial intelligence (AI) that studies how to solve tasks by learning from examples rather than being explicitly programmed
The majority of machine learning algorithms used for clinical predictions are based on the supervised learning approach, which can be summarized in the following steps: first, a set of features is computed from the raw sensor data
We evaluated the reliability of reported accuracies in studies that used machine learning and wearable sensor technology to predict clinical outcomes

Summary

Introduction

Machine learning has evolved as the branch of artificial intelligence (AI) that studies how to solve tasks by learning from examples rather than being explicitly programmed. An increasing number of studies apply machine learning to the data collected from these devices for clinical prediction purposes. The majority of machine learning algorithms used for clinical predictions are based on the supervised learning approach, which can be summarized in the following steps: first, a set of features is computed from the raw sensor data. These features are typically engineered depending on the specific application; e.g., one feature could be the maximum heart rate in a given time interval. Once the classifier is trained on enough data, it can be used to perform predictions on new subjects using their features; e.g., do their features predict that they are healthy?

Objectives

Methods

Results

Conclusion