Abstract

A major challenge in biological sciences is the reconstruction of the Tree of Life. To this effect, large genomic databases like GenBank and SwissProt are being mined for clusters from which phylogenies can be inferred. Systematists and comparative biologists commonly combine such phylogenies into informative supertrees that reveal information which was not explicitly displayed in any of the original phylogenies. However, whether a supertree is informative depends on particular overlap properties among the clusters from which it originates. In this work we formally introduce the concept of groves — sets of clusters with the potential to construct informative supertrees. Thus maximal potential candidate clusters for informative supertree construction can be identified in large databases through groves, prior to inferring trees for each cluster. Groves also have the potential to lead to informative supermatrix construction. We developed methods that (i) efficiently identify particular types of groves and (ii) find lower and upper bounds on the minimal number of groves needed to cover all the trees or data sets in a database. Finally, we apply our methods to the green plant sequences from GenBank.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.