Abstract

With the emergence of next-generation sequencing (NGS) technology, there have been a large number of metagenomic studies that estimated the bacterial composition via 16S ribosomal RNA (16S rRNA) amplicon sequencing. In particular, subsets of the hypervariable regions in 16S rRNA, such as V1–V2 and V3–V4, are targeted using high-throughput sequencing. The sequences from different taxa are assigned to a specific taxon based on the sequence homology. Since such sequences are highly homologous or identical between species in the same genus, it is challenging to determine the exact species using 16S rRNA sequences only. Therefore, in this study, homologous species groups were defined to obtain maximum resolution related with species using 16S rRNA. For the taxonomic assignment using 16S rRNA, three major 16S rRNA databases are independently used since the lineage of certain bacteria is not consistent among these databases. On the basis of the NCBI taxonomy classification, we re-annotated inconsistent lineage information in three major 16S rRNA databases. For each species, we constructed a consensus sequence model for each hypervariable region and determined homologous species groups that consist of indistinguishable species in terms of sequence homology. Using a k-nearest neighbor method and the species consensus sequence models, the species-level taxonomy was determined. If the species determined is a member of homologous species groups, the species group is assigned instead of a specific species. Notably, the results of the evaluation on our method using simulated and mock datasets showed a high correlation with the real bacterial composition. Furthermore, in the analysis of real microbiome samples, such as salivary and gut microbiome samples, our method successfully performed species-level profiling and identified differences in the bacterial composition between different phenotypic groups.

Highlights

  • Metagenomics has been widely used to analyze microbial communities without cultivating strains (Breitbart et al, 2003; Schloss and Handelsman, 2003; Handelsman, 2004; Petrosino et al, 2009; Qin et al, 2010; Peng et al, 2019; Yang L. et al, 2019; Brumfield et al, 2020; Chung et al, 2020; Khachatryan et al, 2020)

  • The evaluation performed using simulated datasets and mock datasets showed a high correlation with the real bacterial composition

  • Using the 16S rRNA sequences from three major 16S rRNA databases, we investigated the consistency of the taxonomic lineage annotation

Read more

Summary

Introduction

Metagenomics has been widely used to analyze microbial communities without cultivating strains (Breitbart et al, 2003; Schloss and Handelsman, 2003; Handelsman, 2004; Petrosino et al, 2009; Qin et al, 2010; Peng et al, 2019; Yang L. et al, 2019; Brumfield et al, 2020; Chung et al, 2020; Khachatryan et al, 2020). The 16S ribosomal RNA (16S rRNA) gene has been regarded as an informative resource for the identification of the species and the estimation of bacterial composition as it has both well-conserved and hypervariable regions among different species. The conserved regions can be used as primers to target specific hypervariable regions using targeted amplicon sequencing (Petrosino et al, 2009), whereas the hypervariable regions can be used to identify bacterial taxonomy using the sequence similarities between different species. Using 454 pyrosequencing (Petrosino et al, 2009; Cummings et al, 2013) and Illumina MiSeq technology (Wen et al, 2017; Ravi et al, 2018; Sessou et al, 2019), 16S rRNA analysis pipelines were built to estimate the bacterial composition of different species (Turnbaugh et al, 2007; Jumpstart Consortium Human Microbiome Project Data Generation Working Group, 2012). Several studies have been conducted to investigate the manner in which the analysis of different variable regions affects the estimation of bacterial composition (Sun et al, 2013; Johnson et al, 2019)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call