HmmUFOtu

Qi Zheng,Jacquelyn Meisel,Casey Bartow-Mckenney,Elizabeth A Grice

doi:10.1145/3233547.3233612

Abstract

Over the last decade, joint advances in next-generation sequencing technology and bioinformatics pipelines have dramatically improved our understanding of host-associated and environmental microbiota. Standard microbiome community analysis typically involves amplicon sequencing of the prokaryotic 16S rRNA gene. These sequences are then clustered into operational taxonomic units (OTUs) for downstream diversity analyses, but also to reduce computational burden and allow for rapid analysis of datasets. Taxonomy is then assigned to all reads of an OTU, based on the assignment of a representative read. Although straightforward in principle, present methods often rely on heuristics while constructing (or picking) OTUs to avoid computationally expensive algorithms, and ignore the prior knowledge of microbial phylogeny to further reduce the computational complexity. Here, we present HmmUFOtu, a novel tool for processing 16S rRNA sequences that addresses major limitations of current OTU picking and taxonomy assignment methods. HmmUFOtu relies on rapid per-read phylogenetic placement, followed by OTU picking and taxonomic assignment based on the phylogeny of known taxa. By benchmarking on simulated, mock community, and real datasets, we show that HmmUFOtu achieves high assignment accuracy, sensitivity, specificity and precision, even at species-level resolution. Compared to standard pipelines, HmmUFOtu more accurately recapitulates community diversity and composition. HmmUFOtu can perform taxonomic assignment in a species-resolution reference tree with ~ 200,000 nodes for 1 million 16S sequencing reads within 6 hours on a modest Linux workstation with 16 processors and 32 GB RAM. HmmUFOtu is written in C++98 and freely available at https://github.com/Grice-Lab/HmmUFOtu/.

Full Text