BackgroundThe 12-lead electrocardiogram (ECG) is an established modality for cardiovascular assessment. While deep learning algorithms have shown promising results for analyzing ECG data, the limited availability of labeled datasets hinders broader applications. Self-supervised learning can learn meaningful representations from the unlabeled data and transfer the knowledge to downstream tasks. This study underscores the development and validation of a self-supervised learning methodology tailored to produce universal ECG representations from longitudinally collected ECG data, applicable across a spectrum of cardiovascular assessments. MethodsWe introduced a pre-trained model that utilizes contrastive self-supervised learning to universal ECG representations from 4,932,573 ECG tracing from 1,684,298 adult patients on 7 campuses of Chang Gung Memorial Hospital. We extensively evaluated the proposed model using an internal dataset collected from diverse healthcare establishments and an external public dataset encompassing varied cardiovascular conditions and sample magnitudes. ResultsThe pre-trained model showed the equivalent performance to the conventionally trained models, which solely rely on supervised learning in both internal and external datasets, to assess atrial fibrillation, atrial flutter, premature rhythm abnormalities, first-degree atrioventricular block, and myocardial infarction. When applied to small sample sizes, it was observed that the learned ECG representations enhanced the classification models, resulting in an improvement of up to 0.3 of the area under the receiver operating characteristic (AUROC). ConclusionsThe ECG representations learned from longitudinal ECG data are highly effective, particularly with small sample sizes, and further enhance the learning process and boost robustness.
Read full abstract