Abstract

Identifying transcription factors (TFs) whose DNA bindings are altered by genetic variants that regulate susceptibility genes is imperative to understand transcriptional dysregulation in disease etiology. Here, we develop a statistical framework to analyze extensive ChIP-seq and GWAS data and identify 22 breast cancer risk-associated TFs. We find that, by analyzing genetic variations of TF-DNA bindings, the interaction of FOXA1 with co-factors such as ESR1 and E2F1, and the interaction of TFs with chromatin features (i.e., enhancers) play a key role in breast cancer susceptibility. Using genetic variants occupied by the 22 TFs, transcriptome-wide association analyses identify 52 previously unreported breast cancer susceptibility genes, including seven with evidence of essentiality from functional screens in breast relevant cell lines. We show that FOXA1 and co-factors form a core TF-transcriptional network regulating the susceptibility genes. Our findings provide additional insights into genetic variations of TF-DNA bindings (particularly for FOXA1) underlying breast cancer susceptibility.

Highlights

  • We used generalized mixed models to estimate the associations between the Chi-squared values (Y) and transcription factors (TFs) binding status of genetic variants located in binding sites of each TF given LD blocks of genetic variants to handle the dependence between genetic variants (Fig.1c and Eq 1)

  • In regular TWAS approaches, the prediction accuracy of the prediction model with cis-genetic variants could be low or compromised if they occur in nonregulatory elements, or if they disrupt binding sites of non-transcribed TFs in target tissues

  • We demonstrated that TWAS analysis using genetic variants located in binding sites of risk-associated TFs significantly improved the detection of breast cancer susceptibility genes

Read more

Summary

Results

To investigate how genetic variations of TF-DNA bindings affect breast cancer susceptibility, we developed an analytic framework to analyze ChIP-seq and breast cancer GWAS summary statistics data (Fig. 1a–c). We generated a “deflated” genome (Fig. 1d, red line) based on random uniform distribution of GWAS P-values after removing variants majorly from those having small P-values for breast cancer risk in each block (see “Methods” section) In this “deflated” genome, we still observed that genetic variations of TFDNA bindings for 17 TFs remained significant at a nominal P < 0.05. Motif-dependent genetic variations of TF-DNA bindings of breast cancer risk-associated TFs. Genomic annotation of the 22 identified TFs’ binding sites revealed that they are generally significantly enriched in intragenic regions. We observed a substantial proportion of genetic variants located in co-occupied binding sites

Discussion
Methods
Code availability

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.