Abstract

Background As the number of sequenced genomes grows, researchers have access to an increasingly rich source for discovering detailed evolutionary information. However, the computational technologies for inferring biologically important evolutionary events are not sufficiently developed.Results We present algorithms to estimate the evolutionary time (t_{text {MRS}}) to the most recent substitution event from a multiple alignment column by using a probabilistic model of sequence evolution. As the confidence in estimated t_{text {MRS}} values varies depending on gap fractions and nucleotide patterns of alignment columns, we also compute the standard deviation sigma of t_{text {MRS}} by using a dynamic programming algorithm. We identified a number of human genomic sites at which the last substitutions occurred between two speciation events in the human lineage with confidence. A large fraction of such sites have substitutions that occurred between the concestor nodes of Hominoidea and Euarchontoglires. We investigated the correlation between tissue-specific transcribed enhancers and the distribution of the sites with specific substitution time intervals, and found that brain-specific transcribed enhancers are threefold enriched in the density of substitutions in the human lineage relative to expectations.Conclusions We have presented algorithms to estimate the evolutionary time (t_{text {MRS}}) to the most recent substitution event from a multiple alignment column by using a probabilistic model of sequence evolution. Our algorithms will be useful for Evo-Devo studies, as they facilitate screening potential genomic sites that have played an important role in the acquisition of unique biological features by target species.

Highlights

  • As the number of sequenced genomes grows, researchers have access to an increasingly rich source for discovering detailed evolutionary information

  • Kiryu et al Algorithms Mol Biol (2019) 14:23 of genomes [8,9,10,11,12,13,14]. These statistics are computed using probabilistic models that model the stochastic processes of DNA mutations along phylogenetic species trees, which are used in tree reconstruction [4,5,6], and detect genomic regions that show smaller or larger mutation rates using likelihood ratio tests or similar probabilistic computations

  • We develop algorithms to compute three statistics, tMRS, σ, and q, for each column of a multiple genome alignment based on an evolutionary model that is similar to those described above. tMRS is the evolutionary time to the most recent substitution event that occurred along the lineage of a given target species in the phylogenetic tree

Read more

Summary

Introduction

As the number of sequenced genomes grows, researchers have access to an increasingly rich source for discovering detailed evolutionary information. As sequenced genomes continue to accumulate, a very rich source for discovering detailed evolutionary information grows. As it is difficult to visually inspect functional regions from 100-species alignments, computing genome-wide summary statistics is very important. Kiryu et al Algorithms Mol Biol (2019) 14:23 of genomes [8,9,10,11,12,13,14] These statistics are computed using probabilistic models that model the stochastic processes of DNA mutations along phylogenetic species trees, which are used in tree reconstruction [4,5,6], and detect genomic regions that show smaller or larger mutation rates using likelihood ratio tests or similar probabilistic computations

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call