The taming of an impossible child: a standardized all-in approach to the phylogeny of Hymenoptera using public database sequences.

Ralph S Peters,Lars Krogmann,Karen Meusemann,Kai Schütte,Janus Borner,Bernhard Misof,Oliver Niehuis,Benjamin Meyer

doi:10.1186/1741-7007-9-55

Abstract

BackgroundEnormous molecular sequence data have been accumulated over the past several years and are still exponentially growing with the use of faster and cheaper sequencing techniques. There is high and widespread interest in using these data for phylogenetic analyses. However, the amount of data that one can retrieve from public sequence repositories is virtually impossible to tame without dedicated software that automates processes. Here we present a novel bioinformatics pipeline for downloading, formatting, filtering and analyzing public sequence data deposited in GenBank. It combines some well-established programs with numerous newly developed software tools (available at http://software.zfmk.de/).ResultsWe used the bioinformatics pipeline to investigate the phylogeny of the megadiverse insect order Hymenoptera (sawflies, bees, wasps and ants) by retrieving and processing more than 120,000 sequences and by selecting subsets under the criteria of compositional homogeneity and defined levels of density and overlap. Tree reconstruction was done with a partitioned maximum likelihood analysis from a supermatrix with more than 80,000 sites and more than 1,100 species. In the inferred tree, consistent with previous studies, "Symphyta" is paraphyletic. Within Apocrita, our analysis suggests a topology of Stephanoidea + (Ichneumonoidea + (Proctotrupomorpha + (Evanioidea + Aculeata))). Despite the huge amount of data, we identified several persistent problems in the Hymenoptera tree. Data coverage is still extremely low, and additional data have to be collected to reliably infer the phylogeny of Hymenoptera.ConclusionsWhile we applied our bioinformatics pipeline to Hymenoptera, we designed the approach to be as general as possible. With this pipeline, it is possible to produce phylogenetic trees for any taxonomic group and to monitor new data and tree robustness in a taxon of interest. It therefore has great potential to meet the challenges of the phylogenomic era and to deepen our understanding of the tree of life.

Highlights

Enormous molecular sequence data have been accumulated over the past several years and are still exponentially growing with the use of faster and cheaper sequencing techniques
We present a standardized, fast and transparent bioinformatics pipeline to collect, filter and analyze public sequence data deposited in GenBank
Major lineages within Apocrita Within Apocrita, our analysis suggests a topology of Stephanoidea + (Ichneumonoidea + (Proctotrupomorpha + (Evanioidea + Aculeata))) (Figure 3)

Summary

Introduction

Enormous molecular sequence data have been accumulated over the past several years and are still exponentially growing with the use of faster and cheaper sequencing techniques. We present a novel bioinformatics pipeline for downloading, formatting, filtering and analyzing public sequence data deposited in GenBank It combines some well-established programs with numerous newly developed software tools (available at http://software.zfmk.de/). McMahon and Sanderson [2], Sanderson et al [3] and Thomson and Shaffer [4] have published their attempts to use molecular data from public databases and to process them for phylogenetic analysis. These approaches, while valuable and trend-setting, did not offer thorough solutions and call for extension, improvements and updates in terms of generalization, detail, analysis and degree of automation. We use a large exemplar taxon for which far more than 100,000 sequences have been published and show that comprehensive analyses can potentially deliver new results which were not available from each included data set separately

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Biology	Publication Date: Aug 18, 2011
Citations: 101	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

The taming of an impossible child: a standardized all-in approach to the phylogeny of Hymenoptera using public database sequences.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Biology

Lead the way for us

Similar Papers

Mandatory submission of microarray data to public repositories: how is it working?
Beverly Ventura
Physiological Genomics | VOL. 20
Beverly VenturaBeverly Ventura
20 Jan 2005
Physiological Genomics | VOL. 20

Calculating the quality of public high-throughput sequencing data to obtain a suitable subset for reanalysis from the Sequence Read Archive.
Tazro Ohta ... Hidemasa Bono
GigaScience | VOL. 6
Tazro Ohta, et. al.Tazro Ohta ... Hidemasa Bono
25 Apr 2017
GigaScience | VOL. 6

Identification of a novel gene by whole human genome tiling array
Hirokazu Ishida ... Kei Tashiro
Gene | VOL. 516
Hirokazu Ishida, et. al.Hirokazu Ishida ... Kei Tashiro
19 Dec 2012
Gene | VOL. 516

Access to biodiversity for food production: Reconciling open access digital sequence information with access and benefit sharing
Brad Sherman ... Robert J Henry
Molecular Plant | VOL. 14
Brad Sherman, et. al.Brad Sherman ... Robert J Henry
05 Mar 2021
Molecular Plant | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The taming of an impossible child: a standardized all-in approach to the phylogeny of Hymenoptera using public database sequences.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Biology