Abstract

The majority of the single nucleotide variants (SNVs) identified by genome-wide association studies (GWAS) fall outside of the protein-coding regions. Elucidating the functional implications of these variants has been a major challenge. A possible mechanism for functional non-coding variants is that they disrupted the canonical transcription factor (TF) binding sites that affect the in vivo binding of the TF. However, their impact varies since many positions within a TF binding motif are not well conserved. Therefore, simply annotating all variants located in putative TF binding sites may overestimate the functional impact of these SNVs. We conducted a comprehensive survey to study the effect of SNVs on the TF binding affinity. A sequence-based machine learning method was used to estimate the change in binding affinity for each SNV located inside a putative motif site. From the results obtained on 18 TF binding motifs, we found that there is a substantial variation in terms of a SNV’s impact on TF binding affinity. We found that only about 20% of SNVs located inside putative TF binding sites would likely to have significant impact on the TF-DNA binding.

Highlights

  • Thousands of genome-wide association studies (GWAS) have been conducted over the past 15 years, resulting in considerable single nucleotide variants (SNVs) being discovered as robustly associated with a wide array of phenotypes (Welter et al, 2014)

  • The authors took advantage of sequencingbased assays, such as ChIP-seq (Johnson et al, 2007) that is able to recognize transcription factor (TF) binding in vivo, to define gapped k-mer support vector machine weights to quantify the different level of abundance of k-mers at functionally important genomic loci

  • All SNVs that fall into putative TF binding sites, called by position weight matrix (PWM) scan, are considered to affect transcriptional regulation

Read more

Summary

Introduction

Thousands of genome-wide association studies (GWAS) have been conducted over the past 15 years, resulting in considerable single nucleotide variants (SNVs) being discovered as robustly associated with a wide array of phenotypes (Welter et al, 2014). There exist highly informative PWMs for many TF binding sites in databases like TRANSFAC (Wingender et al, 2000), Factorbook (Wang et al, 2012), JASPAR (Sandelin et al, 2004), HOCOMOCO (Kulakovskiy et al, 2017), and CIS-BP, these resources are designed for characterizing motif patterns. Their effectiveness for measuring the impact of mutations has yet to be investigated. We aim to evaluate whether measuring the overall PWM probability difference for a motif with or without a mutation is a reasonable strategy to measure the impact of the mutation

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call