Abstract

BackgroundDNA barcoding and other DNA sequence-based techniques for investigating and estimating biodiversity require explicit methods for associating individual sequences with taxa, as it is at the taxon level that biodiversity is assessed. For many projects, the bioinformatic analyses required pose problems for laboratories whose prime expertise is not in bioinformatics. User-friendly tools are required for both clustering sequences into molecular operational taxonomic units (MOTU) and for associating these MOTU with known organismal taxonomies.ResultsHere we present jMOTU, a Java program for the analysis of DNA barcode datasets that uses an explicit, determinate algorithm to define MOTU. We demonstrate its usefulness for both individual specimen-based Sanger sequencing surveys and bulk-environment metagenetic surveys using long-read next-generation sequencing data. jMOTU is driven through a graphical user interface, and can analyse tens of thousands of sequences in a short time on a desktop computer. A companion program, Taxonerator, that adds traditional taxonomic annotation to MOTU, is also presented. Clustering and taxonomic annotation data are stored in a relational database, and are thus amenable to subsequent data mining and web presentation.ConclusionsjMOTU efficiently and robustly identifies the molecular taxa present in survey datasets, and Taxonerator decorates the MOTU with putative identifications. jMOTU and Taxonerator are freely available from http://www.nematodes.org/.

Highlights

  • The Linnaean project has already delivered species names for over a million taxa [1], but current estimates for the actual number of species on Earth range from 10 to 100 million [2,3]

  • Molecular survey methods have been proposed as a practical solution to bridge the gulf between the desire and need to describe diversity and the number of hands and minds available to do the describing [4,5,6,7]. These methods use the DNA sequence of a conserved gene or gene fragment and objective clustering rules to group the sequences into molecular operational taxonomic units (MOTU) [4]

  • PLoS ONE | www.plosone.org jMOTU & Taxonnerator for DNA Barcode Analysis sequence-based surveys of diversity has focussed on clustering of the individual sequences into MOTU using a sequence similarity cutoff derived from the known within-species diversity in the surveyed gene

Read more

Summary

Results

Taxonerator analysis is dependent on the number of representative sequences that must be compared, but the above dataset analysed at cutoffs from 3 to 10 bases required 4 hr on the same workstation. As the base cutoff for MOTU definition was increased there was an initial sharp fall in number of MOTU inferred, dropping to 32 MOTU at 2 bases difference (,0.3% difference across 600 bases) This steep drop is what would be expected from analysis of data that include rare stochastic sequencing error and within-population variability. To analyse the complete dataset (all 9 samples) we used the jMOTU postgreSQL database to extract the representative sequences for all samples at the 3 base cutoff. These were pooled, and second-tier jMOTU analysis was performed. The example datasets analysed in this paper are available on GenBank/EMBL/DDBJ; they are available from the jMOTU website

Introduction
Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call