Abstract

In comparative genomics one analyzes jointly evolutionarily related species in order to identify conserved and diverged sequences and to infer their function. While such studies enabled the detection of conserved sequences in large genomes, the evolutionary dynamics of regulatory regions as a whole remain poorly understood. Here we present a probabilistic model for the evolution of promoter regions in yeast, combining the effects of regulatory interactions of many different transcription factors. The model expresses explicitly the selection forces acting on transcription factor binding sites in the context of a dynamic evolutionary process. We develop algorithms to compute likelihood and to learn de novo collections of transcription factor binding motifs and their selection parameters from alignments. Using the new techniques, we examine the evolutionary dynamics in Saccharomyces species promoters. Analyses of an evolutionary model constructed using all known transcription factor binding motifs and of a model learned from the data automatically reveal relatively weak selection on most binding sites. Moreover, according to our estimates, strong binding sites are constraining only a fraction of the yeast promoter sequence that is under selection. Our study demonstrates how complex evolutionary dynamics in noncoding regions emerges from formalization of the evolutionary consequences of known regulatory mechanisms.

Highlights

  • Genomic regulatory regions harbor complex control schemes that collectively allow the genome to operate in a flexible and dynamic fashion

  • Short DNA sequences that physically bind transcription factors in promoter areas near target genes play an important role in gene regulation and are directly subject to mutation and selection

  • We develop a methodology for studying the evolution of promoter sequences under the effect of multiple regulatory interactions

Read more

Summary

Introduction

Genomic regulatory regions harbor complex control schemes that collectively allow the genome to operate in a flexible and dynamic fashion. Such control schemes are encoded into the DNA sequence in a way that is not yet fully understood. Important elements of such regulatory code are short DNA sequences that are bound by transcription factors (TFs). Much of the current understanding of the way in which DNA determines the regulatory program of a gene is based on identification of TF binding sites (TFBSs) and their association with TFs of known function. Many conserved loci were shown to correspond to TFBSs, allowing detection of novel sites that were not identifiable using single species methods

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call