Abstract

Background. The computational identification of functional transcription factor binding sites (TFBSs) remains a major challenge of computational biology. Results. We have analyzed the conserved promoter sequences for the complete set of human RefSeq genes using our conserved transcription factor binding site (CONFAC) software. CONFAC identified 16296 human-mouse ortholog gene pairs, and of those pairs, 9107 genes contained conserved TFBS in the 3 kb proximal promoter and first intron. To attempt to predict in vivo occupancy of transcription factor binding sites, we developed a novel marginal effect isolator algorithm that builds upon Bayesian methods for multigroup TFBS filtering and predicted the in vivo occupancy of two transcription factors with an overall accuracy of 84%. Conclusion. Our analyses show that integration of chromatin immunoprecipitation data with conserved TFBS analysis can be used to generate accurate predictions of functional TFBS. They also show that TFBS cooccurrence can be used to predict transcription factor binding to promoters in vivo.

Highlights

  • The computational identification of functional transcription factor binding sites (TFBSs) remains a major challenge of computational biology

  • A primary reason that accurate prediction of relevant TFBS remains difficult is due to the short (6–12 bp) degenerate motifs represented as position weight matrices (PWMs) that match high numbers of false positives in genomic sequences

  • conserved transcription factor binding site (CONFAC) works by identifying the conserved sequences in the 3 kb proximal promoter region and first intron of human-mouse ortholog gene pairs and identifying TFBS, defined by position weight matrices from the MATCH software [11], that are conserved between the two species [1]

Read more

Summary

Introduction

The computational identification of functional transcription factor binding sites (TFBSs) remains a major challenge of computational biology. We have analyzed the conserved promoter sequences for the complete set of human RefSeq genes using our conserved transcription factor binding site (CONFAC) software. CONFAC identified 16296 human-mouse ortholog gene pairs, and of those pairs, 9107 genes contained conserved TFBS in the 3 kb proximal promoter and first intron. One of the important challenges in computational biology is the accurate prediction of functional transcription factor binding sites (TFBSs). We previously described the conserved transcription factor binding site (CONFAC). Software that uses a comparative genomic approach to identify evolutionarily conserved and statistically overrepresented. We have applied the CONFAC analysis to the complete set of 21222 RefSeq transcripts identified in the. We mined our conserved TFBS data in combination with public in vivo occupancy data [10] using

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call