NG-Tax 2.0: A Semantic Framework for High-Throughput Amplicon Analysis.

Wasin Poncheewin,Peter J Schaap,Jasper J Koehorst,Jesse C J Van Dam,Hauke Smidt,Gerben D A Hermes

doi:10.3389/fgene.2019.01366

Wasin Poncheewin, Peter J Schaap + Show 4 more

Open Access

https://doi.org/10.3389/fgene.2019.01366

Copy DOI

Journal: Frontiers in Genetics	Publication Date: Jan 23, 2020
Citations: 91	License type: CC BY 4.0

Affiliation: Wageningen University & Research

Abstract

NG-Tax 2.0 is a semantic framework for FAIR high-throughput analysis and classification of marker gene amplicon sequences including bacterial and archaeal 16S ribosomal RNA (rRNA), eukaryotic 18S rRNA and ribosomal intergenic transcribed spacer sequences. It can directly use single or merged reads, paired-end reads and unmerged paired-end reads from long range fragments as input to generate de novo amplicon sequence variants (ASV). Using the RDF data model, ASV’s can be automatically stored in a graph database as objects that link ASV sequences with the full data-wise and element-wise provenance, thereby achieving the level of interoperability required to utilize such data to its full potential. The graph database can be directly queried, allowing for comparative analyses of over thousands of samples and is connected with an interactive Rshiny toolbox for analysis and visualization of (meta) data. Additionally, NG-Tax 2.0 exports an extended BIOM 1.0 (JSON) file as starting point for further analyses by other means. The extended BIOM file contains new attribute types to include information about the command arguments used, the sequences of the ASVs formed, classification confidence scores and is backwards compatible. The performance of NG-Tax 2.0 was compared with DADA2, using the plugin in the QIIME 2 analysis pipeline. Fourteen 16S rRNA gene amplicon mock community samples were obtained from the literature and evaluated. Precision of NG-Tax 2.0 was significantly higher with an average of 0.95 vs 0.58 for QIIME2-DADA2 while recall was comparable with an average of 0.85 and 0.77, respectively. NG-Tax 2.0 is written in Java. The code, the ontology, a Galaxy platform implementation, the analysis toolbox, tutorials and example SPARQL queries are freely available at http://wurssb.gitlab.io/ngtax under the MIT License.

Highlights

High-throughput sequencing technologies have empowered our ability to study complex environmental and host-associated microbial communities
An amplicon sequence variants (ASV) can be separated from error-reads on the basis of the expectation that due to the biological origin, a real sequence variant is located at a fixed position in the amplicon sequence and more likely to be repeatedly observed in those samples where the particular biological variant is present
Erroneous ASVs are rejected if their read count does not exceed an experimentally defined dynamic threshold that takes the evenness of the distribution into account (Ramiro-Garcia et al, 2016)

Summary

Introduction

High-throughput sequencing technologies have empowered our ability to study complex environmental and host-associated microbial communities. One strategy to reduce the number of false taxonomic inferences due to these error-reads, is to cluster amplicon reads by sequence identity in operational taxonomic units (a process called OTU-picking) at some user defined identity thresholds. To build these OTUs, centroid or seed sequence-based greedy clustering approaches are frequently used (Stackebrandt and Goebel, 1994; Konstantinidis and Tiedje, 2005; Godzik and Li, 2006; Edgar, 2010). Recent studies have shown that a de novo clustering approach using exact matches would yield better results (Ramiro-Garcia et al, 2016; Callahan et al, 2017). In the past the accuracy of NG-Tax has been benchmarked against QIIME (Caporaso et al, 2010), using synthetic mock communities and has been shown to outperform it (RamiroGarcia et al, 2016)

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

NG-Tax 2.0: A Semantic Framework for High-Throughput Amplicon Analysis.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Genetics

Lead the way for us

Similar Papers

Fungal Metabarcoding Data for Two Grapevine Varieties (Regent and Vitis vinifera ‘Cabernet-Sauvignon’) Inoculated with Powdery Mildew (Erysiphe necator) Under Drought Conditions
Corinne Vacher ... Julie Faivre D'Arcier
Phytobiomes Journal | VOL. 6
Corinne Vacher, et. al.Corinne Vacher ... Julie Faivre D'Arcier
15 Nov 2022
Phytobiomes Journal | VOL. 6

Population-level prokaryotic community structures associated with ferromanganese nodules in the Clarion-Clipperton Zone (Pacific Ocean) revealed by 16S rRNA gene amplicon sequencing.
Kento Tominaga ... Hiroaki Takebe
Environmental microbiology reports | VOL. 16
Kento Tominaga, et. al.Kento Tominaga ... Hiroaki Takebe
26 Dec 2023
Environmental microbiology reports | VOL. 16

Premise plumbing bacterial communities in four European cities and their association with Legionella
Maria Scaturro ... Bozena Krogulska
Frontiers in Microbiomes | VOL. 2
Maria Scaturro, et. al.Maria Scaturro ... Bozena Krogulska
19 Jun 2023
Frontiers in Microbiomes | VOL. 2

P79 COVID-19 vaccine-induced antibody response is associated with oral microbiota composition in patients with inflammatory bowel disease, cirrhosis, and liver transplantation
James Alexander ... Eleanor Barnes
Gut | VOL. 72
James Alexander, et. al.James Alexander ... Eleanor Barnes
01 Jun 2023
Gut | VOL. 72

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

NG-Tax 2.0: A Semantic Framework for High-Throughput Amplicon Analysis.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Genetics