Abstract

The high level of conservation of 16S ribosomal RNA gene (16S rRNA) in all Prokaryotes makes this gene an ideal tool for the rapid identification and classification of these microorganisms. Databases such as the Ribosomal Database Project II (RDP-II) and the Greengenes Project offer access to sets of ribosomal RNA sequence databases useful in identification of microbes in a culture-independent analysis of microbial communities. However, these databases do not contain all of the taxonomic levels attached to the published names of the bacterial and archaeal sequences. TaxCollector is a set of scripts developed in Python language that attaches taxonomic information to all 16S rRNA sequences in the RDP-II and Greengenes databases. These modified databases are referred to as TaxCollector databases, which when used in conjunction with BLAST allow for rapid classification of sequences from any environmental or clinical source at six different taxonomic levels, from domain to species. The TaxCollector database prepared from the RDP-II database is an important component of a new 16S rRNA pipeline called PANGEA. The usefulness of TaxCollector databases is demonstrated with two very different datasets obtained using samples from a clinical setting and an agricultural soil. The six TaxCollector scripts are freely available on http://taxcollector.sourceforge.net and on http://www.microgator.org.

Highlights

  • The sequencing and PCR amplification of 16S ribosomal RNA gene (16S rRNA) sequences have become the basis for the rapid identification and classification of Prokaryotes

  • In addition to the names and nodes obtained files from National Center for Biotechnology Information (NCBI) and the full taxonomic information obtained from the Greengenes database in a fasta format, available at http://greengenes.lbl.gov/Download/Sequence_Data/Fasta_data_files/, the file generated by acctotax.py was used by the TaxCollectorGreengenes.py to generate greengenesTC.fas database

  • The databases used for classification consisted of variants of the Ribosomal Database Project II (RDP-II) and Greengenes databases modified to contain the taxonomic information in the header of each sequence's published name (Figure 1)

Read more

Summary

Introduction

The sequencing and PCR amplification of 16S ribosomal RNA gene (16S rRNA) sequences have become the basis for the rapid identification and classification of Prokaryotes. Generation sequencing produces many thousands of small 16S rRNA reads per sample [2,3,4,5] that require a further classification step using specific 16S rRNA databases Projects such as the Ribosomal Database Project II (RDP-II) at Michigan State University [6,7] and the Greengenes Project, maintained by the Lawrence Berkeley. National Laboratory [8] offer access to sets of rRNA sequence databases useful in microbial population studies Those databases can be downloaded directly from the database project websites and can be tailored to the interests of the user such as the selection of the sequences in the database from cultured and/or uncultured organisms. The TaxCollector code can be quickly adapted to any other gene with an available database

Experimental Section
Utility of the TaxCollector modified databases using two datasets
RDP Classifier
Results
Discussion
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.