Abstract

In eukaryotic genomes, it is challenging to accurately determine target sites of transcription factors (TFs) by only using sequence information. Previous efforts were made to tackle this task by considering the fact that TF binding sites tend to be more conserved than other functional sites and the binding sites of several TFs are often clustered. Recently, ChIP-chip and ChIP-sequencing experiments have been accumulated to identify TF binding sites as well as survey the chromatin modification patterns at the regulatory elements such as promoters and enhancers. We propose here a hidden Markov model (HMM) to incorporate sequence motif information, TF-DNA interaction data and chromatin modification patterns to precisely identify cis-regulatory modules (CRMs). We conducted ChIP-chip experiments on four TFs, CREB, E2F1, MAX, and YY1 in 1% of the human genome. We then trained a hidden Markov model (HMM) to identify the labels of the CRMs by incorporating the sequence motifs recognized by these TFs and the ChIP-chip ratio. Chromatin modification data was used to predict the functional sites and to further remove false positives. Cross-validation showed that our integrated HMM had a performance superior to other existing methods on predicting CRMs. Incorporating histone signature information successfully penalized false prediction and improved the whole performance. The dataset we used and the software are available at http://nash.ucsd.edu/CIS/.

Highlights

  • High throughput technologies such as ChIP-Chip [1,2] and ChIP-sequencing [3,4] have been successfully applied to map binding locations of individual transcription factors (TFs) at a genomic scale in organisms ranging from yeast to human [2,5,6,7]

  • Given that human genes are often under combinatorial regulation of TFs and the functional binding sites of cooperative transcription factors (TFs) tend to be located close to each other to form clusters in the eukaryotic genome [8], which are often referred as cis-regulatory modules (CRMs), locating CRMs have been proven to be effective on improving the accuracy of predicting TF binding and uncover functional binding sites

  • The individual and multiple TF hidden Markov model (HMM) have the same structure and the only difference is the number of position specific scoring matrices (PSSMs) block: one PSSM block in the individual TF HMM and multiple PSSM blocks in the multiple TF HMMs

Read more

Summary

Introduction

High throughput technologies such as ChIP-Chip [1,2] and ChIP-sequencing [3,4] have been successfully applied to map binding locations of individual transcription factors (TFs) at a genomic scale in organisms ranging from yeast to human [2,5,6,7]. Given that human genes are often under combinatorial regulation of TFs and the functional binding sites of cooperative transcription factors (TFs) tend to be located close to each other to form clusters in the eukaryotic genome [8], which are often referred as cis-regulatory modules (CRMs), locating CRMs have been proven to be effective on improving the accuracy of predicting TF binding and uncover functional binding sites. Methods such as CisModule [12] and EmcModule [13] conduct de novo identification of CRMs in the sense of simultaneously defining

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.