Abstract

Next-generation sequencing (NGS) technologies have provided great opportunities to analyze pathogenic microbes with high-resolution data. The main goal is to accurately detect microbial composition and abundances in a sample. However, high similarity among sequences from different species and the existence of sequencing errors pose various challenges. Numerous methods have been developed for quantifying microbial composition and abundance, but they are not versatile enough for the analysis of samples with mixtures of noise. In this paper, we propose a new computational method, PGMicroD, for the detection of pathogenic microbial composition in a sample using NGS data. The method first filters the potentially mistakenly mapped reads and extracts multiple species-related features from the sequencing reads of 16S rRNA. Then it trains an Support Vector Machine classifier to predict the microbial composition. Finally, it groups all multiple-mapped sequencing reads into the references of the predicted species to estimate the abundance for each kind of species. The performance of PGMicroD is evaluated based on both simulation and real sequencing data and is compared with several existing methods. The results demonstrate that our proposed method achieves superior performance. The software package of PGMicroD is available at https://github.com/BDanalysis/PGMicroD.

Highlights

  • In the last decade, metagenomics has emerged as a remarkable event in the study of microbial ecology (Lindner and Renard, 2013)

  • This is because noised species are very similar to the ground truth species; in such a situation, the reads coming from the noised species are aligned well to the reference database

  • The reads from noised species will increase the estimated abundance of the ground truth species. This negative effect can be minimized by implementing a mapping score filter, which can remove most of the reads originating from unrelated species

Read more

Summary

Introduction

Metagenomics has emerged as a remarkable event in the study of microbial ecology (Lindner and Renard, 2013). The detection of pathogenic microbial composition (i.e., species and their abundance) is very important in this field since it can provide valuable information for supporting pathogenic treatment and in the fields of ecology and human health (Chaudhary et al, 2015). Next-generation sequencing (NGS) provides an unprecedented opportunity to explore the composition of microbes in a sample (Fuhrman, 2012). 16S rRNA from microbes contains regions of highly conserved and highly variable sequences, providing a reliable substrate for species identification (Teeling and Glockner, 2012). BLAST (Scott and Madden, 2004),calculates the similarity score for local alignments of reads against reference sequences to explore the microbial diversity of complex environments.

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.