Abstract Genetics is estimated to account for 50% of Coronary Artery Disease (CAD) risk. Genome-Wide Association Studies (GWAS) have revealed 100s of genomic loci which are associated with CAD (1). However, moving from GWAS signals to causal genes has proved challenging, slowing progress in developing interventions for CAD and other diseases. The greatest challenge in interpreting GWAS data is that only 10% of variants are in protein-coding regions and 90% are in "non-coding" genomic regions. Many of these non-coding variants are found in regulatory elements such as enhancers. Enhancers are distal regulatory elements that bind transcription factors and regulate genes up to 1 million base pairs away. This is achieved by DNA looping bringing enhancers into physical proximity with target genes. However, it is not possible to predict which gene(s) a given enhancer will regulate based on genomic location alone. We aimed to develop a generalisable workflow to solve CAD GWAS signals accurately and sensitively, by identifying causal enhancer SNPs and linking them mechanistically to output genes in a cell-type specific manner. We combined cell-specific epigenomic mapping with machine learning to prioritise 21,959 SNPs from 861 candidate CAD loci, in endothelial cells (HUVECs) and smooth muscle cells (HCASMCs). Micro-Capture-C (MCC) was used to resolve enhancer-promoter interactions at base-pair resolution (2). MNase footprinting determined differential transcription factor binding at specific enhancer SNPs. In HCASMCs, 20 putative enhancers containing CAD SNPs were found to interact with 32 genes, and in HUVECs 25 enhancers interacted with 48 genes. These genes were enriched for pathways including TGFb signalling, RUNX3 activity, and MAPK6/MAPK4 signalling. To prove causality at specific loci, heterozygous SNPs were identified in the individual primary cell lines. The interaction frequency at these elements was statistically compared between risk and non-risk alleles. For example, at the 9p21.3 locus, an enhancer containing a non-risk T allele was found to interact more strongly with the CDKN2B gene, compared to the risk G allele. Interestingly, this imbalance between allele interaction was potentiated by stimulation with TGFb, a key mediator in atherosclerosis. Furthermore, the transcription factor footprint at the SNP location was altered by the risk allele, indicating the mechanistic cause of the CAD-associated change in CDKN2B expression. We demonstrate a combined computational and experimental approach to identify causal non-coding variants in CAD risk loci, at scale. Among the key outputs of this study is the identification of the risk variant (rs1537373), output gene (CDKN2B), and specific cell type (HCASMC) underlying the 9p21.3 locus association with CAD risk. Using this approach to translate GWAS signals into cell-type specific outputs genes, will lead to a better understanding of the pathogenesis of CAD, and new therapeutic opportunities.
Read full abstract