Self-supervised representation learning from 12-lead ECG data

Temesgen Mehari,Nils Strodthoff

doi:10.1016/j.compbiomed.2021.105114

Abstract

Clinical 12-lead electrocardiography (ECG) is one of the most widely encountered kinds of biosignals. Despite the increased availability of public ECG datasets, label scarcity remains a central challenge in the field. Self-supervised learning represents a promising way to alleviate this issue. This would allow to train more powerful models given the same amount of labeled data and to incorporate or improve predictions about rare diseases, for which training datasets are inherently limited. In this work, we put forward the first comprehensive assessment of self-supervised representation learning from clinical 12-lead ECG data. To this end, we adapt state-of-the-art self-supervised methods based on instance discrimination and latent forecasting to the ECG domain. In a first step, we learn contrastive representations and evaluate their quality based on linear evaluation performance on a recently established, comprehensive, clinical ECG classification task. In a second step, we analyze the impact of self-supervised pretraining on finetuned ECG classifiers as compared to purely supervised performance. For the best-performing method, an adaptation of contrastive predictive coding, we find a linear evaluation performance only 0.5% below supervised performance. For the finetuned models, we find improvements in downstream performance of roughly 1% compared to supervised performance, label efficiency, as well as robustness against physiological noise. This work clearly establishes the feasibility of extracting discriminative representations from ECG data via self-supervised learning and the numerous advantages when finetuning such representations on downstream tasks as compared to purely supervised training. As first comprehensive assessment of its kind in the ECG domain carried out exclusively on publicly available datasets, we hope to establish a first step towards reproducible progress in the rapidly evolving field of representation learning for biosignals.

Highlights

The availability of datasets with high-quality labels is an omnipresent challenge in machine learning in general, but especially in the health domain, where the labeling process is expensive and clinical ground truth is in many cases hard to define
We present the first comprehensive assessment of self-supervised representation learning for 12-lead ECG data to foster measurable progress in the subfield of representation learning for biosignals
We put forward a comprehensive assessment of self-supervised representation learning on 12-lead clinical ECG data

Summary

Introduction

The availability of datasets with high-quality labels is an omnipresent challenge in machine learning in general, but especially in the health domain, where the labeling process is expensive and clinical ground truth is in many cases hard to define. The amount of unlabeled data often exceeds the amount of labeled data by several orders of magnitude, which represents a strong case for (self-supervised) representation learning from unlabeled data. During the past few years, self-supervised learning has made enormous advances in different domains ranging from natural language processing [1] over speech [2] to computer vision [3]. Self-supervised learning could be one component towards addressing the problem of data scarcity. It could help to train more accurate and potentially more robust models given the same amount of labeled data, which is a desirable prospect for any application field. Of particular importance for the medical domain are improvements in label efficiency, which could allow to train models on more finegrained and less populated label hierarchies, or to include rare diseases that were out of reach with conventional training methods

Objectives

Methods

Findings

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Computers in Biology and Medicine	Publication Date: Dec 18, 2021
Citations: 74	License type: cc-by

R Discovery Prime

R Discovery Prime

Self-supervised representation learning from 12-lead ECG data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Computers in Biology and Medicine

Lead the way for us

Similar Papers

A Novel Multi-Task Self-Supervised Representation Learning Paradigm
Yinggang Li ... Qi Zhang
Control theory & applications | VOL. -
Yinggang Li, et. al.Yinggang Li ... Qi Zhang
28 May 2021
Control theory & applications | VOL. -

A Novel Solution for EEG-based Emotion Recognition
Zhuofan Xie ... Haixin Sun
-
Zhuofan Xie, et. al.Zhuofan Xie ... Haixin Sun
13 Oct 2021
13 Oct 2021

CaSS: A Channel-Aware Self-supervised Representation Learning Framework for Multivariate Time Series Classification
Yijiang Chen ... Minyang Xu
-
Yijiang Chen, et. al.Yijiang Chen ... Minyang Xu
01 Jan 2021
01 Jan 2021

Detection of maternal and fetal stress from the electrocardiogram with self-supervised representation learning
Pritam Sarkar ... Marta C Antonelli
Scientific Reports | VOL. 11
Pritam Sarkar, et. al.Pritam Sarkar ... Marta C Antonelli
01 Dec 2021
Scientific Reports | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Self-supervised representation learning from 12-lead ECG data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Computers in Biology and Medicine