Abstract

We have assembled a collection of web pages that contain benchmark datasets and software tools to enable the evaluation of the accuracy and scalability of computational methods for estimating evolutionary relationships. They provide a resource to the scientific community for development of new alignment and tree inference methods on very difficult datasets. The datasets are intended to help address three problems: multiple sequence alignment, phylogeny estimation given aligned sequences, and supertree estimation. Datasets from our work include empirical datasets with carefully curated alignments suitable for testing alignment and phylogenetic methods for large-scale systematics studies. Links to other empirical datasets, lacking curated alignments, are also provided. We also include simulated datasets with properties typical of large-scale systematics studies, including high rates of substitutions and indels, and we include the true alignment and tree for each simulated dataset. Finally, we provide links to software tools for generating simulated datasets, and for evaluating the accuracy of alignments and trees estimated on these datasets. We welcome contributions to the benchmark datasets from other researchers.

Highlights

  • We have assembled a collection of web pages that contain benchmark datasets and software tools to enable the evaluation of the accuracy and scalability of computational methods for estimating evolutionary relationships

  • The empirical benchmark datasets for testing multiple sequence alignment include datasets with highly reliable curated sequence alignments that have been carefully validated by the community

  • The curated alignments provided in Comparative RNA Website (CRW) are based upon secondary structural information, which is helpful where the mature rRNA is double stranded due to sequence complementarity

Read more

Summary

Introduction

We have assembled a collection of web pages that contain benchmark datasets and software tools to enable the evaluation of the accuracy and scalability of computational methods for estimating evolutionary relationships. To facilitate testing of large-scale alignment and phylogeny estimation methods, we have assembled a collection of web pages of (1) benchmark datasets and (2) software appropriate for creating new simulated benchmark datasets (http://www .cs.utexas.edu/users/phylo/datasets/). Because these datasets have been assembled with an eye to their usefulness for Tree of Life-scale projects, only datasets that have large numbers of taxa and/or present other difficulties for phylogenetic reconstruction and alignment (e.g., high rates of substitution and insertions and deletions) are included.

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call