PhylotaR: An Automated Pipeline for Retrieving Orthologous DNA Sequences from GenBank in R.

Dominic Bennett,Hannes Hettling,Christine Bacon,Rutger Vos,Søren Faurby,Alexandre Antonelli,Alexander Zizka,Daniele Silvestro

doi:10.3390/life8020020

Dominic Bennett, Hannes Hettling + Show 6 more

Open Access

https://doi.org/10.3390/life8020020

Copy DOI

Abstract

The exceptional increase in molecular DNA sequence data in open repositories is mirrored by an ever-growing interest among evolutionary biologists to harvest and use those data for phylogenetic inference. Many quality issues, however, are known and the sheer amount and complexity of data available can pose considerable barriers to their usefulness. A key issue in this domain is the high frequency of sequence mislabeling encountered when searching for suitable sequences for phylogenetic analysis. These issues include, among others, the incorrect identification of sequenced species, non-standardized and ambiguous sequence annotation, and the inadvertent addition of paralogous sequences by users. Taken together, these issues likely add considerable noise, error or bias to phylogenetic inference, a risk that is likely to increase with the size of phylogenies or the molecular datasets used to generate them. Here we present a software package, phylotaR that bypasses the above issues by using instead an alignment search tool to identify orthologous sequences. Our package builds on the framework of its predecessor, PhyLoTa, by providing a modular pipeline for identifying overlapping sequence clusters using up-to-date GenBank data and providing new features, improvements and tools. We demonstrate and test our pipeline’s effectiveness by presenting trees generated from phylotaR clusters for two large taxonomic clades: Palms and primates. Given the versatility of this package, we hope that it will become a standard tool for any research aiming to use GenBank data for phylogenetic analysis.

Highlights

The first step in any nucleotide-based phylogenetic analysis is the identification of sequence homology
For a given taxonomic group, PhyLoTa searches through available sequences on GenBank and identifies orthologous sequence clusters
We demonstrate and test the capacity of phylotaR by generating phylogenetic trees for two model clades, widely studied in evolutionary biology: palms and primates

Summary

Introduction

The first step in any nucleotide-based phylogenetic analysis is the identification of sequence homology. In an early attempt to address these issues, Sanderson et al [4] developed a pipeline, PhyLoTa, that uses the Basic Local Alignment Search Tool (BLAST [9]) to identify orthologous sequences without the need for gene name matching. For a given taxonomic group, PhyLoTa searches through available sequences on GenBank and identifies orthologous sequence clusters. New databases of orthologous sequences have become available [13,14] These databases, are not based on GenBank but instead on assembled genomes—massively limiting their taxonomic coverage. A user has the option of a secondary cluster stage (cluster2 ) to identify and merge clusters at higher taxonomic levels than is available with PhyLoTa. We demonstrate and test the capacity of phylotaR by generating phylogenetic trees for two model clades, widely studied in evolutionary biology: palms and primates

The Pipeline

Empirical Demonstration

Conclusions

Findings

Methods

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Life	Publication Date: Jun 5, 2018
Citations: 28	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

PhylotaR: An Automated Pipeline for Retrieving Orthologous DNA Sequences from GenBank in R.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Life

Lead the way for us

Similar Papers

Determining the potential utility of datasets for phylogeny reconstruction
Alexandra H Wortley ... Robert W Scotland
TAXON | VOL. 55
Alexandra H Wortley, et. al.Alexandra H Wortley ... Robert W Scotland
01 May 2006
TAXON | VOL. 55

Models and Algorithms for Whole-Genome Evolution and their Use in Phylogenetic Inference

-

01 Jan 2012
01 Jan 2012

Reappraising adaptive radiation
Michael J Sanderson
American Journal of Botany | VOL. 85
Michael J SandersonMichael J Sanderson
01 Nov 1998
American Journal of Botany | VOL. 85

Conflict and resolution between phylogenies inferred from molecular and phenotypic data sets for hagfish, lampreys, and gnathostomes
Thomas J Near
Journal of Experimental Zoology Part B: Molecular and Developmental Evolution | VOL. 312B
Thomas J NearThomas J Near
28 Apr 2009
Journal of Experimental Zoology Part B: Molecular and Developmental Evolution | VOL. 312B

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

PhylotaR: An Automated Pipeline for Retrieving Orthologous DNA Sequences from GenBank in R.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Life