Abstract

BackgroundTranscription factor binding affinities to DNA play a key role for the gene regulation. Learning the specificity of the mechanisms of binding TFs to DNA is important both to experimentalists and theoreticians. With the development of high-throughput methods such as, e.g., ChiP-seq the need to provide unbiased models of binding events has been made apparent. We present EMQIT a modification to the approach introduced by Alamanova et al. and later implemented as 3DTF server. We observed that tuning of Boltzmann factor weights, used for conversion of calculated energies to nucleotide probabilities, has a significant impact on the quality of the associated PWM matrix.ResultsConsequently, we proposed to use receiver operator characteristics curves and the 10-fold cross-validation to learn best weights using experimentally verified data from TRANSFAC database. We applied our method to data available for various TFs. We verified the efficiency of detecting TF binding sites by the 3DTF matrices improved with our technique using experimental data from the TRANSFAC database. The comparison showed a significant similarity and comparable performance between the improved and the experimental matrices (TRANSFAC). Improved 3DTF matrices achieved significantly higher AUC values than the original 3DTF matrices (at least by 0.1) and, at the same time, detected notably more experimentally verified TFBSs.ConclusionsThe resulting new improved PWM matrices for analyzed factors show similarity to TRANSFAC matrices. Matrices had comparable predictive capabilities. Moreover, improved PWMs achieve better results than matrices downloaded from 3DTF server. Presented approach is general and applicable to any energy-based matrices.EMQIT is available online at http://biosolvers.polsl.pl:3838/emqit.ReviewersThis article was reviewed by Oliviero Carugo, Marek Kimmel and István Simon.

Highlights

  • Transcription factor binding affinities to DNA play a key role for the gene regulation

  • Derived Position Weight Matrices (PWM) models of Transcription Factor Binding Site (TFBS) profiles are usually deposited in the Jaspar [4] and the TRANSFAC [1] databases

  • Both NF-κB matrices improved in this study are less specific in their inner part matrices obtained from the 3DTF server

Read more

Summary

Introduction

Transcription factor binding affinities to DNA play a key role for the gene regulation. With the development of high-throughput methods such as, e.g., ChiP-seq the need to provide unbiased models of binding events has been made apparent. DNA-binding site models exist for over 1800 vertebrate TFs and about 3600 known Transcription Factor Binding Sites (TFBSs) in human and over 5000 in mouse. Total number of binding sites in the multicellular genomes could be at least an order of magnitude higher than the number of coding genes [1]. Development of next-generation sequencing methods like ChIP-Seq or ChIP-Chip, covering TF binding over whole genome, remarkably simplifies analysis of gene. Binding motifs in DNA are commonly represented by the Position Weight Matrices (PWM) and the Phylogenetic Motif Models (PMM).

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call