Contrastive learning of heart and lung sounds for label-efficient diagnosis

Pratham N Soni,Siyu Shi,Pranav R Sriram,Andrew Y Ng,Pranav Rajpurkar

doi:10.1016/j.patter.2021.100400

Pratham N Soni, Siyu Shi + Show 3 more

Open Access

https://doi.org/10.1016/j.patter.2021.100400

Copy DOI

Abstract

SummaryData labeling is often the limiting step in machine learning because it requires time from trained experts. To address the limitation on labeled data, contrastive learning, among other unsupervised learning methods, leverages unlabeled data to learn representations of data. Here, we propose a contrastive learning framework that utilizes metadata for selecting positive and negative pairs when training on unlabeled data. We demonstrate its application in the healthcare domain on heart and lung sound recordings. The increasing availability of heart and lung sound recordings due to adoption of digital stethoscopes lends itself as an opportunity to demonstrate the application of our contrastive learning method. Compared to contrastive learning with augmentations, the contrastive learning model leveraging metadata for pair selection utilizes clinical information associated with lung and heart sound recordings. This approach uses shared context of the recordings on the patient level using clinical information including age, sex, weight, location of sounds, etc. We show improvement in downstream tasks for diagnosing heart and lung sounds when leveraging patient-specific representations in selecting positive and negative pairs. This study paves the path for medical applications of contrastive learning that leverage clinical information. We have made our code available here: https://github.com/stanfordmlgroup/selfsupervised-lungandheartsounds.

Highlights

Data labeling is an expensive and time consuming process in machine learning
We further explore the use of clinical information including age group, sex, and recording location to create positive and negative pairs of examples and to leverage insights from clinical information associated with the recordings
We show that using age group, sex, and performance, measured with area under the receiver operating characteristic (AUROC), increases to 0.854 (95% confidence interval [CI]: 0.823, 0.882) and 0.863, compared with baseline AUCs of 0.512 and 0.516, when using 10% and 100% of labeled training data, respectively

Summary

Introduction

Data labeling is an expensive and time consuming process in machine learning. This problem is exacerbated in domains where trained experts are required to label data, such as agriculture, healthcare, and language translation. Contrastive learning, a type of SSL, is a potential solution to the problem of limited labeled data by using unlabeled data to learn general representations of data, contrasting similar. Previous work has explored methods to leverage metadata associated with unlabeled data in SSL, including encoding genre and playlist associated with song audio for song representation,[5] using patient metadata associated with ultrasound as weak labels,[7] and selecting contrastive pairs based on patient and study information.[8]

Results

Discussion

Conclusion