Abstract Background Within cardiovascular medicine, AI-driven electrocardiogram (ECG) analysis has emerged as a powerful tool, capable of predicting clinically significant abnormalities. The increasing growth of wearable technologies for ECG monitoring underscores the importance of advancing AI methodologies. While Convolutional Neural Networks (CNNs) have traditionally been the go-to choice for ECG analysis, the rise of Vision Transformers (ViTs), a novel computer vision paradigm, raises the question of their potential superiority. Purpose To compare the performance of CNNs and ViTs in ECG analysis, and to assess the efficiency of lean models using truncated ECG recordings. Methods We trained a model using 12-lead ECGs as input for the binary classification of two outputs: Sex (Male or Female) and Left ventricular dysfunction (LVD, defined as below 35%), determined by echocardiography within 2 weeks of the ECG. Models were developed using full standard ECGs, single lead and truncated recordings (1,2 and 4 10 seconds). We compared CNNs (PyTorch) and ViTs (HuggingFace) performance. We analyzed the explainability of our findings, representing the AI model's focus of interest on ECGs from normal cases versus those with reduced ejection fraction, thereby validating its diagnostic discernment (Fig 1) Results We identified 150,691 and 29,422 patients with valid ECG tests for the Sex and LVD prediction models, respectively. For the Sex outcome, the AUROCs were 0.911 (95% CI: 0.908-0.914) and 0.898 (95% CI: 0.894-0.901) for the CNN-based model and the ViT-based models, respectively, with the difference statistically significant (P-Value < 0.001). For the LVD outcome, the AUROCs were comparable at 0.878 (95% CI: 0.864-0.892) and 0.866 (95% CI: 0.853-0.879) for the CNN-based model and the ViT-based models, respectively, (P-Value = 0.056). Furthermore, we found that 98.8% for the maximal predictive power of the models was achieved within a 2 second time frame of a 12 lead ECG. Similarly, a single lead based model achieved a relatively high AUC for the prediction of LVD. Conclusions AI-ECG models developed with ViT architecture did not surpass CNN-based models in sex classification and LVD identification. Our study highlights the significance of lean models allowing for rapid prediction within 2-second tracings or using a single lead, demonstrates the potential for streamlined AI-ECG applications in clinical practice.
Read full abstract