Abstract

The advent of high throughput sequencing has enabled in-depth characterization of human and environmental microbiomes. Determining the taxonomic origin of microbial sequences is one of the first, and frequently only, analysis performed on microbiome samples. Substantial research has focused on the development of methods for taxonomic annotation, often making trade-offs in computational efficiency and classification accuracy. A side-effect of these efforts has been a reexamination of the bacterial taxonomy itself. Taxonomies developed prior to the genomic revolution captured complex relationships between organisms that went beyond uniform taxonomic levels such as species, genus, and family. Driven in part by the need to simplify computational workflows, the bacterial taxonomies used most commonly today have been regularized to fit within a standard seven taxonomic levels. Consequently, modern analyses of microbial communities are relatively coarse-grained. Few methods make classifications below the genus level, impacting our ability to capture biologically relevant signals. Here, we present ATLAS, a novel strategy for taxonomic annotation that uses significant outliers within database search results to group sequences in the database into partitions. These partitions capture the extent of taxonomic ambiguity within the classification of a sample. The ATLAS pipeline can be found on GitHub [https://github.com/shahnidhi/outlier_in_BLAST_hits]. We demonstrate that ATLAS provides similar annotations to phylogenetic placement methods, but with higher computational efficiency. When applied to human microbiome data, ATLAS is able to identify previously characterized taxonomic groupings, such as those in the class Clostridia and the genus Bacillus. Furthermore, the majority of partitions identified by ATLAS are at the subgenus level, replacing higher-level annotations with specific groups of species. These more precise partitions improve our detection power in determining differential abundance in microbiome association studies.

Highlights

  • The microbiome plays an important role in human and ecological health

  • Our results focus on data from 16S rRNA gene surveys, ATLAS can be used with any marker gene sequencing data to characterize the taxonomic composition of a microbial community and to determine microbiome associations with human and ecological health

  • We compared the taxonomic assignments generated by ATLAS for the Human Microbiome Project (HMP) and Global Enteric Multicenter Study (GEMS) datasets to the labels generated by TIPP (Nguyen et al, 2014)

Read more

Summary

Introduction

The microbiome plays an important role in human and ecological health. One of the first steps in microbial characterization is taxonomic classification. Many microbiome studies involve extracting DNA from a microbial community and amplifying and sequencing the 16S rRNA gene, a gene encoding part of the ribosomal complex. This gene is highly conserved across prokaryotes and can be amplified even from previously unknown organisms. Phylogenetic approaches (Yang and Rannala, 2012) were used to build trees to relate organisms based on how they evolved from each other These trees were independent of taxonomic annotation and were instead generated directly from sequencing data via neighbor-joining (Zhang and Sun, 2008), maximum parsimony (Fitch, 1971; Tamura et al, 2011), maximum likelihood (Stamatakis, 2006), or other methods. Because building a phylogenetic tree is computationally expensive, we often perform taxonomic annotation by searching against a reference database of “known” sequences instead

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call