Identifying optimal incomplete phylogenetic data sets from sequence databases

Changhui Yan,J Gordon Burleigh,Oliver Eulenstein

doi:10.1016/j.ympev.2005.02.008

Abstract

We introduce a new method for identifying optimal incomplete data sets from large sequence databases based on the graph theoretic concept of α-quasi-bicliques. The quasi-biclique method searches large sequence databases to identify useful phylogenetic data sets with a specified amount of missing data while maintaining the necessary amount of overlap among genes and taxa. The utility of the quasi-biclique method is demonstrated on large simulated sequence databases and on a data set of green plant sequences from GenBank. The quasi-biclique method greatly increases the taxon and gene sampling in the data sets while adding only a limited amount of missing data. Furthermore, under the conditions of the simulation, data sets with a limited amount of missing data often produce topologies nearly as accurate as those built from complete data sets. The quasi-biclique method will be an effective tool for exploiting sequence databases for phylogenetic information and also may help identify critical sequences needed to build large phylogenetic data sets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Identifying optimal incomplete phylogenetic data sets from sequence databases

Abstract

Talk to us

Similar Papers

More From: Molecular Phylogenetics and Evolution

Lead the way for us

Journal: Molecular Phylogenetics and Evolution	Publication Date: Mar 21, 2005
Citations: 62

Similar Papers

Obtaining maximal concatenated phylogenetic data sets from large sequence databases.
M J Sanderson
Molecular Biology and Evolution | VOL. 20
M J SandersonM J Sanderson
25 Apr 2003
Molecular Biology and Evolution | VOL. 20

Identification of cross-linked peptides from large sequence databases.
Oliver Rinner ... Alexander Schmidt
Nature methods | VOL. 5
Oliver Rinner, et. al.Oliver Rinner ... Alexander Schmidt
09 Mar 2008
Nature methods | VOL. 5

Efficient processing of similarity search under time warping in sequence databases: an index-based approach
Sang-Wook Kim ... Wesley W Chu
Information Systems | VOL. 29
Sang-Wook Kim, et. al.Sang-Wook Kim ... Wesley W Chu
05 Jun 2003
Information Systems | VOL. 29

Knowledge acquisition in incomplete fuzzy information systems via the rough set approach
Wei‐Zhi Wu ... Wen‐Xiu Zhang
Expert Systems | VOL. 20
Wei‐Zhi Wu, et. al.Wei‐Zhi Wu ... Wen‐Xiu Zhang
10 Oct 2003
Expert Systems | VOL. 20

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Identifying optimal incomplete phylogenetic data sets from sequence databases

Abstract

Talk to us

Similar Papers

More From: Molecular Phylogenetics and Evolution