Abstract

Machine learning has the potential to change the practice of medicine, particularly in areas that require pattern recognition (e.g. radiology). Although automated classification is unlikely to be perfect, few modern machine learning tools have the ability to assess their own classification confidence to recognize uncertainty that might need human review. Using automated single-channel sleep staging as a first implementation, we demonstrated that uncertainty information (as quantified using Shannon entropy) can be utilized in a “human in the loop” methodology to promote targeted review of uncertain sleep stage classifications on an epoch-by-epoch basis. Across 20 sleep studies, this feedback methodology proved capable of improving scoring agreement with the gold standard over automated scoring alone (average improvement in Cohen’s Kappa of 0.28), in a fraction of the scoring time compared to full manual review (60% reduction). In summary, our uncertainty-based clinician-in-the-loop framework promotes the improvement of medical classification accuracy/confidence in a cost-effective and economically resourceful manner.

Highlights

  • The practices of machine learning and artificial intelligence have seen rapid implementation in many facets of today’s society, spanning multiple fields from industrial automation, smart energy and transportation, the internet of things, and medicine[1]

  • To observe the maximum theoretical possible benefit provided by algorithm-based uncertainty quantification and manual review, we substituted uncertain epochs with the sleep stages in corresponding epochs of the ground truth scoring

  • These automated + substitution results are illustrated in Fig. 1a, for all 20 subjects, stratified by their obstructive sleep apnea (OSA) severity class, along the % of each respective study marked for uncertainty review

Read more

Summary

Introduction

The practices of machine learning and artificial intelligence have seen rapid implementation in many facets of today’s society, spanning multiple fields from industrial automation, smart energy and transportation, the internet of things, and medicine[1]. Recent examples demonstrating the promise of machine learning tools in medicine are Google’s classification of cardiovascular risk from retinal images[3] and Apple’s watch-based classification of atrial fibrillation[4]. Each of these examples (and many others) seek to characterize and identify clinically relevant adverse health outcomes from stores of data acquired both in and out of the hospital, in an attempt to build a prospective classifier for anticipating human health decline. The result is a classification/label of the data (or estimate of a latent state from which the data were observed) provided in an automated fashion This basic method of classification can be performed through a variety of machine learning methods— supervised and unsupervised. Much of this work has been demonstrated on deep learning and reinforcement learning frameworks, with fewer implementations demonstrated using more “traditional” machine learning methods that generally outperform on smaller datasets

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call