Abstract

BackgroundAlgorithms that locate evolutionarily conserved sequences have become powerful tools for finding functional DNA elements, including transcription factor binding sites; however, most methods do not take advantage of an explicit model for the constrained evolution of functional DNA sequences.ResultsWe developed a probabilistic framework that combines an HKY85 model, which assigns probabilities to different base substitutions between species, and weight matrix models of transcription factor binding sites, which describe the probabilities of observing particular nucleotides at specific positions in the binding site. The method incorporates the phylogenies of the species under consideration and takes into account the position specific variation of transcription factor binding sites. Using our framework we assessed the suitability of alignments of genomic sequences from commonly used species as substrates for comparative genomic approaches to regulatory motif finding. We then applied this technique to Saccharomyces cerevisiae and related species by examining all possible six base pair DNA sequences (hexamers) and identifying sequences that are conserved in a significant number of promoters. By combining similar conserved hexamers we reconstructed known cis-regulatory motifs and made predictions of previously unidentified motifs. We tested one prediction experimentally, finding it to be a regulatory element involved in the transcriptional response to glucose.ConclusionThe experimental validation of a regulatory element prediction missed by other large-scale motif finding studies demonstrates that our approach is a useful addition to the current suite of tools for finding regulatory motifs.

Highlights

  • Algorithms that locate evolutionarily conserved sequences have become powerful tools for finding functional DNA elements, including transcription factor binding sites; most methods do not take advantage of an explicit model for the constrained evolution of functional DNA sequences

  • For any given sequence in a multiple alignment taken from different species we determine whether the pattern of substitutions better fits a neutral model of evolution or a conserved model of transcription factor binding site (TFBS) evolution

  • The two models are identical except that in the neutral model genomic base frequencies are used as the equilibrium base frequencies whereas in the conserved TFBS model position specific base frequencies derived from weight matrix models of specific TFBS are used

Read more

Summary

Introduction

Algorithms that locate evolutionarily conserved sequences have become powerful tools for finding functional DNA elements, including transcription factor binding sites; most methods do not take advantage of an explicit model for the constrained evolution of functional DNA sequences. The central assumption of comparative genomics is that functional sequences evolve under constraints while nonfunctional sequences evolve neutrally. This simple assumption underlies several useful algorithms that identify coding genes [1,2], non-coding RNAs [3,4,5], and cisregulatory sites [6,7,8,9,10,11]. We developed a (page number not for citation purposes)

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call