Abstract

The diversity of microbiota is best explored by understanding the phylogenetic structure of the microbial communities. Traditionally, sequence alignment has been used for phylogenetic inference. However, alignment-based approaches come with significant challenges and limitations when massive amounts of data are analyzed. In the recent decade, alignment-free approaches have enabled genome-scale phylogenetic inference. Here we evaluate three alignment-free methods: ACS, CVTree, and Kr for phylogenetic inference with 16s rRNA gene data. We use a taxonomic gold standard to compare the accuracy of alignment-free phylogenetic inference with that of common microbiome-wide phylogenetic inference pipelines based on PyNAST and MUSCLE alignments with FastTree and RAxML. We re-simulate fecal communities from Human Microbiome Project data to evaluate the performance of the methods on datasets with properties of real data. Our comparisons show that alignment-free methods are not inferior to alignment-based methods in giving accurate and robust phylogenic trees. Moreover, consensus ensembles of alignment-free phylogenies are superior to those built from alignment-based methods in their ability to highlight community differences in low power settings. In addition, the overall running times of alignment-based and alignment-free phylogenetic inference are comparable. Taken together our empirical results suggest that alignment-free methods provide a viable approach for microbiome-wide phylogenetic inference.

Highlights

  • Bacterial systematics has been a difficult problem because bacteria lack morphological features, which would be easy to characterize

  • After Carl Woese and collaborators started creating phylogenies based on small subunit (SSU) ribosomal RNA sequences[1], sequence-based phylogenies have been accepted as the standard in creating the Tree of Life inference by many biologists

  • We aim to extend the application of three alignment-free methods, ACS [17], CVTree [18], and Kr [19], to phylogenetic inference with 16S ribosomal RNA (rRNA) gene data

Read more

Summary

Introduction

Bacterial systematics has been a difficult problem because bacteria lack morphological features, which would be easy to characterize. The ability to perform high-throughput sequencing of marker genes, such as 16S rRNA gene, has enabled en masse microbial community surveys. The studies in this area have yielded valuable information for characterization of the human microbiome and for understanding its role in many diseases such as irritable bowel syndrome, chronic obstructive pulmonary disease [6], obesity [7, 8], diabetes [9], psoriasis [10], cancer [11], and depression [12]. From the sequencing-informatics perspective, sequence alignment has remained an important algorithmic approach in microbiomics. Many concerns still exist about the ability to infer reliable pairwise alignments, and subsequently to infer multiple sequence alignments necessary for phylogenetic inference

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call