Abstract Genome-wide association studies (GWAS) have identified over 170 genetic variants associated with breast cancer risk. Integrative methods, like colocalization and transcriptome-wide association studies (TWAS), can identify potential biological mediators for genetic associations but, to date, have only contextualized a fraction of GWAS loci. This reduced discovery is due to an overreliance on reference transcriptomic panels of total gene expression, which ignore widespread alternative splicing that gives rise to multiple transcript-isoforms of the same gene with differing effects on tumorigenesis and progression. However, integrating highly correlated transcript-isoforms with phenotypic associations from genome-wide association studies (GWAS) requires methodological innovations.Here, we first introduce isoTWAS, a multivariate, machine learning method to integrate genetics, isoform-level expression, and phenotypic associations in a step-wise testing framework, and present its advantages over traditional gene-level methods.We then apply isoTWAS to multiomic data from non-cancerous breast samples from the Genotype-Tissue Expression Project (GTEx) and breast cancer risk GWAS summary statistics from the Breast Cancer Association Consortium. In total, we generated strong predictive models for 70,816 isoforms across 12,103 genes and estimated isoform associations for overall and subtype-specific breast cancer risk. We observed 570 associated isoforms across 384 unique genes (compared to 141 associated genes out of 14,613 genes tested in a gene-level TWAS with the same GWAS) at FDR and FWER-corrected P < 0.05. Critically, isoTWAS prioritized isoform associations within 1 Megabase of 66 independent GWAS-significant loci compared to 22 from gene-level TWAS. Of these isoforms within 1 Megabase of a GWAS locus, 19 associated isoforms had a significantly associated risk variant within the gene body. Of note, we find a negative association between the ESR1 isoform ENST00000488573 and overall breast cancer risk (P = 3.35 x 10-9). Previous studies using breast tissue have not prioritized ESR1 expression as a potential mediator, despite strong SNP associations at this locus. We are currently updating transcriptomic annotations using long-read RNA-sequencing data to improve the precision and specificity of GTEx isoform-level quantifications. We will then expand our isoTWAS analysis to other related tissues (connective, adipose, and immune-related tissues) to identify causal susceptibility transcripts for overall and subtype-specific risk. Our study underscores the wealth of information remaining in bulk RNA-seq datasets that sophisticated methods like isoTWAS can leverage to identify genetic risk mechanisms for breast cancer that is missed with traditional gene-centric measurements and methods. Citation Format: Arjun Bhattacharya, Yung-Han Chang, Bogdan Pasaniuc, Sara Lindstroem. Isoform-level transcriptome-wide analyses identify extensive genetic mechanisms for breast cancer risk undetectable on the gene-level [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 4332.
Read full abstract