Abstract

Background:Real-world medical data, such as electrocardiogram (ECG), often show a long-tail distribution and severe category imbalance, and severely imbalanced data generate bias in deep learning models. In this work, we investigate how to alleviate the problems of label imbalance and inadequate labelling faced by deep learning models when applied to ECG data. Methods:We constructed a short-duration twelve-lead ECG dataset, containing more than 300,000 samples, for morphological recognition based on the actual distribution to evaluate and compare the recognition ability of humans and computers regarding ECG morphology. Two unique ECG data augmentation methods were designed and were combined with a variety of current mainstream self-supervised learning methods, and ultimately, the pre-trained weights were transferred to an 8-class multi-label ECG classification task for evaluation. Results:The experiments showed that self-supervised pre-training relying on negative sample pairs could achieve significantly better ECG representation than baseline, which was significantly effective for alleviating the imbalance in ECG data and reducing the labels of supervised samples. This method effectively utilized a large number of normal ECG samples. Additionally, with the diagnosis of the expert team as ground truth, under the condition of accessing only a small number of labelled samples, these models even performed better than the human ECG doctors participating in the test. Conclusion:The combination of self-supervised learning and unique data augmentation methods in the recognition of ECG morphology can effectively alleviate the long-tail problem and severe data imbalance and can significantly reduce the need for labelled samples in the downstream task.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call