Abstract Background Conventional analysis of the electrocardiogram (ECG) is based on the extraction of human-defined, visually recognisable features. However, these are not optimised to capture all the information content in the ECG. The variational autoencoder (VAE), a form of unsupervised machine learning, can address this shortcoming by computationally extracting comprehensive and interpretable new ECG features, called latent factors. These latent factors provide a low-dimensional representation of the ECG that maximises capture of data content. This approach could expand our understanding of electrophysiology by identifying novel genetic and phenotypic traits associations with the ECG. Purpose This study aims to uncover genetic determinants of VAE latent factors, comparing them to determinants of conventional, human-derived ECG parameters to gain novel insights into cardiac electrophysiology and related diseases. Methods Over one million median beat ECGs derived from a United States (US) based secondary care centre were used to train a machine learning VAE model, with external validation in the UK Biobank (UKB). We performed common and rare variant association studies for VAE latent factors and conventional ECG traits on quality-controlled UKB data. Associated genetic variants were compared to loci for conventional ECG parameters available in the UKB and literature. Novel GWAS associations were validated in a withheld subset of the UKB cohort. Additionally, we compared the associations of the VAE latent factors and conventional ECG traits with phenotypic traits, disease codes and echocardiographic traits. Results Utilising median beat ECGs, the VAE model identified 20 features, with partial correlations to traditional ECG traits. A genome-wide association study (GWAS) identified 105 associated genomic regions, compared with 62 regions identified with conventional ECG traits on the same dataset. Most discovered loci have been previously associated with ECG features in larger GWAS, providing a positive control validation. We also discovered and validated five genomic regions not previously linked to the ECG (mapped to TRIOBP, EFEMP1, NEDD9, ATP10A and P2RX1). We found rare variant associations for seven genes (NEK6, IL17RA, NME7, MYBPC3, CCT8, ADAMTS6 and SCN5A). The latent factors uncovered associations with 158 phenotypic traits (2093 phenotypes tested) and 147 disease codes (2074 disease phecodes tested) which were not identified by ECG traits. Out of 38 echocardiographic traits, 35 showed higher correlations with the latent factors, while six exhibited significant associations exclusively with the latent factors. Conclusion Our study demonstrates that VAE-derived latent factors outperform conventional ECG parameters in uncovering novel genetic and phenotypic associations. Our findings underscore the opportunity for optimised data-driven feature extraction to enhance our understanding of cardiac electrophysiology.
Read full abstract