Abstract

BackgroundMetagenome shotgun sequencing presents opportunities to identify organisms that may prevent or promote disease. The analysis of sample diversity is achieved by taxonomic identification of metagenomic reads followed by generating an abundance profile. Numerous tools have been developed based on different design principles. Tools achieving high precision can lack sensitivity in some applications. Conversely, tools with high sensitivity can suffer from low precision and require long computation time.MethodsIn this paper, we present WEVOTE (WEighted VOting Taxonomic idEntification), a method that classifies metagenome shotgun sequencing DNA reads based on an ensemble of existing methods using k-mer-based, marker-based, and naive-similarity based approaches. Our evaluation on fourteen benchmarking datasets shows that WEVOTE improves the classification precision by reducing false positive annotations while preserving a high level of sensitivity.ConclusionsWEVOTE is an efficient and automated tool that combines multiple individual taxonomic identification methods to produce more precise and sensitive microbial profiles. WEVOTE is developed primarily to identify reads generated by MetaGenome Shotgun sequencing. It is expandable and has the potential to incorporate additional tools to produce a more accurate taxonomic profile. WEVOTE was implemented using C++ and shell scripting and is available at www.github.com/aametwally/WEVOTE.

Highlights

  • The microbiome plays a vital role in a broad range of host-related processes and has a significant effect on host health

  • WEighted VOting Taxonomic idEntification method (WEVOTE) is developed primarily to identify reads generated by MetaGenome Shotgun sequencing

  • The existing taxonomic identification methods of MetaGenome Shotgun (MGS) data can be primarily classified into four categories: methods based on naive-similarity, methods based on analyzing sequence alignment results, methods based on sequence composition, such as k-mers, and marker-based methods

Read more

Summary

Introduction

The microbiome plays a vital role in a broad range of host-related processes and has a significant effect on host health. Since the number of sequences in the database is enormous, these methods have a high probability of finding a match. These types of methods usually achieve a higher level of sensitivity compared to other methods [4, 5]. It has been shown that the taxonomic profile obtained from the naive-similarity-based methods produces a large number of false positives [5, 6], a vast array of researchers are still dependent on them because they do not want to sacrifice the high level of sensitivity to obtain fewer false positives annotations. Tools with high sensitivity can suffer from low precision and require long computation time

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.