Abstract

Sequencing of taxonomic or phylogenetic markers is becoming a fast and efficient method for studying environmental microbial communities. This has resulted in a steadily growing collection of marker sequences, most notably of the small-subunit (SSU) ribosomal RNA gene, and an increased understanding of microbial phylogeny, diversity and community composition patterns. However, to utilize these large datasets together with new sequencing technologies, a reliable and flexible system for taxonomic classification is critical. We developed CREST (Classification Resources for Environmental Sequence Tags), a set of resources and tools for generating and utilizing custom taxonomies and reference datasets for classification of environmental sequences. CREST uses an alignment-based classification method with the lowest common ancestor algorithm. It also uses explicit rank similarity criteria to reduce false positives and identify novel taxa. We implemented this method in a web server, a command line tool and the graphical user interfaced program MEGAN. Further, we provide the SSU rRNA reference database and taxonomy SilvaMod, derived from the publicly available SILVA SSURef, for classification of sequences from bacteria, archaea and eukaryotes. Using cross-validation and environmental datasets, we compared the performance of CREST and SilvaMod to the RDP Classifier. We also utilized Greengenes as a reference database, both with CREST and the RDP Classifier. These analyses indicate that CREST performs better than alignment-free methods with higher recall rate (sensitivity) as well as precision, and with the ability to accurately identify most sequences from novel taxa. Classification using SilvaMod performed better than with Greengenes, particularly when applied to environmental sequences. CREST is freely available under a GNU General Public License (v3) from http://apps.cbu.uib.no/crest and http://lcaclassifier.googlecode.com.

Highlights

  • Marker gene sequencing is an increasingly common technique for profiling the taxonomic composition and diversity of environmental samples

  • CREST includes: 1. the manually curated SSU rRNA taxonomy and reference database SilvaMod based on a modification of the taxonomical annotation used in SILVA SSURef nr release 106; 2. supplementary files for using the Greengenes taxonomy and database as an alternative; 3. a simple classification method based on pairwise alignment and assignment to the lowest common ancestor (LCA) of the resulting highest-scoring alignments; 4. implentations of the classification method as webserver and command line tool (LCAClassifier), and; 5. a new version of the program MEGAN [20] offering CREST classification

  • Ten-fold cross validation tests indicate that CREST LCA Classification achieves better recall with higher precision compared to the Ribosomal Database Project (RDP) Classifier and SINA, regardless of reference database (SilvaMod or Greengenes)

Read more

Summary

Introduction

Marker gene sequencing ( known as ‘‘barcoding’’ or ‘‘metabarcoding) is an increasingly common technique for profiling the taxonomic composition and diversity of environmental samples. The technique has a clear potential for revolutionizing the field of microbial ecology as well as medical microbiology, by frequent and routine profiling of environmental as well as human microbiome samples. It has even been used in macroecology to monitor the distribution and dispersal of animal species [1]. In metagenomic or metatranscriptomic studies, sequences containing SSU rRNA or other markers can be subjected to taxonomic profiling [3,4,5]. None of the existing ‘‘next-generation’’ sequencing protocols available allow for full-length sequencing of the SSU rRNA gene

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call