Abstract

Functional turnover of transcription factor binding sites (TFBSs), such as whole-motif loss or gain, are common events during genome evolution. Conventional probabilistic phylogenetic shadowing methods model the evolution of genomes only at nucleotide level, and lack the ability to capture the evolutionary dynamics of functional turnover of aligned sequence entities. As a result, comparative genomic search of non-conserved motifs across evolutionarily related taxa remains a difficult challenge, especially in higher eukaryotes, where the cis-regulatory regions containing motifs can be long and divergent; existing methods rely heavily on specialized pattern-driven heuristic search or sampling algorithms, which can be difficult to generalize and hard to interpret based on phylogenetic principles. We propose a new method: Conditional Shadowing via Multi-resolution Evolutionary Trees, or CSMET, which uses a context-dependent probabilistic graphical model that allows aligned sites from different taxa in a multiple alignment to be modeled by either a background or an appropriate motif phylogeny conditioning on the functional specifications of each taxon. The functional specifications themselves are the output of a phylogeny which models the evolution not of individual nucleotides, but of the overall functionality (e.g., functional retention or loss) of the aligned sequence segments over lineages. Combining this method with a hidden Markov model that autocorrelates evolutionary rates on successive sites in the genome, CSMET offers a principled way to take into consideration lineage-specific evolution of TFBSs during motif detection, and a readily computable analytical form of the posterior distribution of motifs under TFBS turnover. On both simulated and real Drosophila cis-regulatory modules, CSMET outperforms other state-of-the-art comparative genomic motif finders.

Highlights

  • Phylogenetic shadowing techniques based on probabilistic molecular evolution models have been widely used in various comparative genomic analyses to uncover sequence entities believed to be conserved across species [1,2,3,4]

  • Conventional methods for searching nonconserved motifs across evolutionarily related species have little or no probabilistic machinery to explicitly model this important evolutionary process; they offer little insight into the mechanism and dynamics of transcription factor binding sites (TFBSs) turnover and have limited power in finding motif patterns shaped by such processes

  • We propose a new method: Conditional Shadowing via Multi-resolution Evolutionary Trees, or CSMET, which uses a mathematically elegant and computationally efficient way to model biological sequence evolution at both nucleotide level at each individual site, and functional level of a whole TFBS

Read more

Summary

Introduction

Phylogenetic shadowing techniques based on probabilistic molecular evolution models have been widely used in various comparative genomic analyses to uncover sequence entities believed to be conserved across species [1,2,3,4]. We adopt a more general interpretation reflecting the long-standing evolutionary principles and inferential technique underlying such analysis, rather than the choice of the study subjects It refers to the class of methods that treat evolutionarily related entities as outcomes of some stochastic processes structured as a phylogeny, whereby the relationships between the studied entities can be inferred and utilized to unravel their underlying characteristics of interest. 2) Every site in the same entity evolves independently Not realistic, such a complete and independent shadowing model can lead to efficient algorithms for scoring aligned sequences; and in practice it works well for modeling large and highly conserved functional entities such as gene coding regions in phylogenetically closely related taxa, and it has led to a number of successful comparative genomic gene finders [5,6,7]

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.