Abstract
Targeting sequencing to genes involved in key environmental processes, i.e., ecofunctional genes, provides an opportunity to sample nature's gene guilds to greater depth and help link community structure to process-level outcomes. Vastly different approaches have been implemented for sequence processing and, ultimately, for taxonomic placement of these gene reads. The overall quality of next generation sequence analysis of functional genes is dependent on multiple steps and assumptions of unknown diversity. To illustrate current issues surrounding amplicon read processing we provide examples for three ecofunctional gene groups. A combination of in silico, environmental and cultured strain sequences was used to test new primers targeting the dioxin and dibenzofuran degrading genes dxnA1, dbfA1, and carAa. The majority of obtained environmental sequences were classified into novel sequence clusters, illustrating the discovery value of the approach. For the nitrite reductase step in denitrification, the well-known nirK primers exhibited deficiencies in reference database coverage, illustrating the need to refine primer-binding sites and/or to design multiple primers, while nirS primers exhibited bias against five phyla. Amino acid-based OTU clustering of these two N-cycle genes from soil samples yielded only 114 unique nirK and 45 unique nirS genus-level groupings, likely a reflection of constricted primer coverage. Finally, supervised and non-supervised OTU analysis methods were compared using the nifH gene of nitrogen fixation, with generally similar outcomes, but the clustering (non-supervised) method yielded higher diversity estimates and stronger site-based differences. High throughput amplicon sequencing can provide inexpensive and rapid access to nature's related sequences by circumventing the culturing barrier, but each unique gene requires individual considerations in terms of primer design and sequence processing and classification.
Highlights
Microbial community composition is most frequently assessed using the 16S rRNA gene marker, either in direct-targeted amplification or seed-based retrieval from metagenomic datasets
INITIAL SEQUENCE PROCESSING In order to investigate the diversity of denitrifiers and test primer coverage in environmental samples nirK was amplified using primers nirK517F/1055R (Chen et al, 2010) and nirS with cd3af (Michotey et al, 2000) and R3cd (Throbäck et al, 2004) with 9 bp tag sequences using DNA extracted from six tallgrass prairie sites (34◦58 54 N, 97◦31 14 W) using freeze-grinding mechanical lysis (Zhou et al, 1996)
PolF and PolR are similar to Zf and Zr (Zehr and McReynolds, 1989) which we considered using, but were modified to be less degenerate while maintaining broad coverage of nifH cluster I
Summary
Microbial community composition is most frequently assessed using the 16S rRNA gene marker, either in direct-targeted amplification or seed-based retrieval from metagenomic datasets. Gene-targeted amplicon sequence processing protein-based clustering would be expected to better indicate functional relatedness. The 50% dissimilarity cutoff was chosen as this approximate distance is where reference sequences were clustered to determine primer design groups.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have