Abstract

Genome-wide association studies (GWAS) have discovered thousands loci associated with disease risk and quantitative traits, yet most of the variants responsible for risk remain uncharacterized. The majority of GWAS-identified loci are enriched for non-coding single-nucleotide polymorphisms (SNPs) and defining the molecular mechanism of risk is challenging. Many non-coding causal SNPs are hypothesized to alter transcription factor (TF) binding sites as the mechanism by which they affect organismal phenotypes. We employed an integrative genomics approach to identify candidate TF binding motifs that confer breast cancer-specific phenotypes identified by GWAS. We performed de novo motif analysis of regulatory elements, analyzed evolutionary conservation of identified motifs, and assayed TF footprinting data to identify sequence elements that recruit TFs and maintain chromatin landscape in breast cancer-relevant tissue and cell lines. We identified candidate causal SNPs that are predicted to alter TF binding within breast cancer-relevant regulatory regions that are in strong linkage disequilibrium with significantly associated GWAS SNPs. We confirm that the TFs bind with predicted allele-specific preferences using CTCF ChIP-seq data. We used The Cancer Genome Atlas breast cancer patient data to identify ANKLE1 and ZNF404 as the target genes of candidate TF binding site SNPs in the 19p13.11 and 19q13.31 GWAS-identified loci. These SNPs are associated with the expression of ZNF404 and ANKLE1 in breast tissue. This integrative analysis pipeline is a general framework to identify candidate causal variants within regulatory regions and TF binding sites that confer phenotypic variation and disease risk.

Highlights

  • Genome-wide association studies (GWAS) have identified more than 90 genomic loci and common genetic variants associated with breast cancer [1,2,3,4,5]

  • We identified candidate causal single nucleotide polymorphisms (SNPs) that are predicted to alter transcription factor (TF) binding within breast cancer-relevant regulatory regions that are in strong linkage disequilibrium with significantly associated GWAS SNPs

  • We sought to identify all TFs that bind within open chromatin in breast cancer-relevant cells and tissues

Read more

Summary

Introduction

Genome-wide association studies (GWAS) have identified more than 90 genomic loci and common genetic variants associated with breast cancer [1,2,3,4,5]. The single nucleotide polymorphisms (SNPs) associated with breast cancer have been shown to be enriched in DNA regulatory regions [6, 7], with few residing in coding regions of genes. The effects of putative causal non-coding SNPs are challenging to interpret as they may alter transcription factor (TF) binding sites [12], lncRNA structure [13], splicing [14], transcription start or termination signals, or DNA shape [15]. Non-coding SNPs that alter TF binding sites are the most interpreted because they have the potential to modulate gene expression to mediate their effects on disease risk [16]. It is possible to identify putative causal SNPs by focusing on those that alter TF binding sites in breast tissue

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call