CloVR-16S: Phylogenetic microbial community composition analysis based on 16S ribosomal RNA amplicon sequencing – standard operating procedure, version 1.0

W Florian Fricke,Owen White,Samuel Angiuoli,James White,The Clovr Team,Cesar Arze,Malcolm Matalka

doi:10.1038/npre.2011.5888.3

W Florian Fricke, Owen White + Show 5 more

Open Access

https://doi.org/10.1038/npre.2011.5888.3

Copy DOI

Journal: Nature Precedings	Publication Date: Oct 12, 2011
Citations: 18	License type: CC BY 3.0

Affiliation: University of Maryland, Baltimore

Abstract

AbstractThe CloVR-16S pipeline employs several well-known phylogenetic tools and protocols for the analysis of 16S rRNA sequence datasets:A) Mothur – a C++-based software package used for clustering 16S rRNA sequences into operational taxonomic units (OTUs). Mothur creates OTUs using a matrix that describes pairwise distances between representative aligned sequences and subsequently estimates within-sample diversity (alpha diversity);B) The Ribosomal Database (RDP) naïve Bayesian classifier assigns each 16S sequence to a reference taxonomy with associated empirical probabilities based on oligonucleotide frequencies;C) Qiime – a python-based workflow package, allowing for sequence processing and phylogenetic analysis using different methods including phylogenetic distance (UniFrac) for within-(alpha diversity) and between-(beta diversity) sample analysis;D) Metastats and custom R scripts used to generate additional statistical and graphical evaluations.Though some of the different protocols used in CloVR-16S overlap in purpose (e.g. OTU clustering), the end-user benefits from their overall complementary nature as they focus on different aspects of the phylogenetic analysis. CloVR-16S accepts as input raw multiplex 454-pyrosequencer output, i.e. pooled pyro-tagged sequences from multiple samples, or alternatively, pre-processed sequences from multiple samples in separate files. This protocol is available in CloVR beta versions 0.5 and 0.6.

Highlights

: A) Mothur [1] – a C++-based software package used for clustering 16S rRNA sequences into operational taxonomic units (OTUs)
Requirements for pipeline Input To run the full CloVR-16S analysis track, at least two different input files have to be provided by the user: a sequence file in the FASTA format and a tab-delimited metadata file (.txt)
Sequence data may consist of a single .fasta file that contains sequences from multiple samples, individually pyrotagged by sample-specific barcodes as commonly used in the 454 Amplicon Sequencing protocol

Summary

Metadata file requirements for runs on a single sequence pool

The following rules apply: 1. All entries are tab-delimited. 2. Sequence pre-processing To check each read from the sequence pool for quality and to sort sequences based on the sample-specific barcodes, the "trim.seqs" program is used with the following parameters: "minlength=100" (minimum sequence length) "maxhomop=8" (maximum homopolymer length). All length parameters refer to base pairs (bp) This step generates trimmed .fasta and .groups files, which are used in the downstream analysis. In order to keep only those sequence reads that produce alignments of a minimum length of 50 bp, the "screen.seqs" command is run with the "minlength=50" option. Using the default "furthest neighbor" option, the "cluster" command assigns sequence reads to OTUs based on the distance matrix generated in the previous step. The "summary.single" command produces a summary file of various richness and diversity estimators for each sample

RDP classification of all sequence reads

Sequence processing and analysis with Qiime

Sequence pre-processing

Beta diversity sample analysis

Detection of differentially abundant features