Efficient Enumeration of Phylogenetically Informative Substrings

Stanislav Angelov,Sanjeev Khanna,Junhyong Kim,Boulos Harb,Sampath Kannan

doi:10.1089/cmb.2007.r011

Stanislav Angelov, Sanjeev Khanna + Show 3 more

Open Access

https://doi.org/10.1089/cmb.2007.r011

Copy DOI

Abstract

We study the problem of enumerating substrings that are common amongst genomes that share evolutionary descent. For example, one might want to enumerate all identical (therefore conserved) substrings that are shared between all mammals and not found in non-mammals. Such collection of substrings may be used to identify conserved subsequences or to construct sets of identifying substrings for branches of a phylogenetic tree. For two disjoint sets of genomes on a phylogenetic tree, a substring is called a tag if it is found in all of the genomes of one set and none of the genomes of the other set. We present a near-linear time algorithm that finds all tags in a given phylogeny; and a sublinear space algorithm (at the expense of running time) that is more suited for very large data sets. Under a stochastic model of evolution, we show that a simple process of tag-generation essentially captures all possible ways of generating tags. We use this insight to develop a faster tag discovery algorithm with a small chance of error. However, since tags are not guaranteed to exist in a given data set, we generalize the notion of a tag from a single substring to a set of substrings. We present a linear programming-based approach for finding approximate generalized tag sets. Finally, we use our tag enumeration algorithm to analyze a phylogeny containing 57 whole microbial genomes. We find tags for all nodes in the phylogeny except the root for which we find generalized tag sets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Efficient Enumeration of Phylogenetically Informative Substrings

Abstract

Talk to us

Similar Papers

More From: Journal of Computational Biology

Lead the way for us

Journal: Journal of Computational Biology	Publication Date: Jul 1, 2007
Citations: 3

Similar Papers

Efficient Enumeration of Phylogenetically Informative Substrings
Stanislav Angelov ... Sanjeev Khanna
-
Stanislav Angelov, et. al.Stanislav Angelov ... Sanjeev Khanna
01 Jan 2006
01 Jan 2006

Pollination biology of basal angiosperms (ANITA grade)
Leonard B Thien ... Joseph H Williams
American Journal of Botany | VOL. 96
Leonard B Thien, et. al.Leonard B Thien ... Joseph H Williams
01 Jan 2009
American Journal of Botany | VOL. 96

Not from the apes: By B. Kurtén. 1972. London: V. Gollancz Ltd. A vol. in-8 o, viii + 184 pp., 6 figs, 2 tabs. Cloth £1·75

Journal of Human Evolution | VOL. 3

01 Jan 1974
Not from the apes: By B. Kurtén. 1972. London: V. Gollancz Ltd. A vol. in-8 o, viii + 184 pp., 6 figs, 2 tabs. Cloth £1·75

Hominoid Phylogeny Estimated by Model Selection Using Goodness of Fit Significance Tests
J Czelusniak ... M Goodman
Molecular Phylogenetics and Evolution | VOL. 4
J Czelusniak, et. al.J Czelusniak ... M Goodman
01 Sep 1995
Molecular Phylogenetics and Evolution | VOL. 4

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Efficient Enumeration of Phylogenetically Informative Substrings

Abstract

Talk to us

Similar Papers

More From: Journal of Computational Biology