Abstract

BackgroundWith the advances in the next-generation sequencing technologies, researchers can now rapidly examine the composition of samples from humans and their surroundings. To enhance the accuracy of taxonomy assignments in metagenomic samples, we developed a method that allows multiple mismatch probabilities from different genomes.ResultsWe extended the algorithm of taxonomic assignment of metagenomic sequence reads (TAMER) by developing an improved method that can set a different mismatch probability for each genome rather than imposing a single parameter for all genomes, thereby obtaining a greater degree of accuracy. This method, which we call TADIP (Taxonomic Assignment of metagenomics based on DIfferent Probabilities), was comprehensively tested in simulated and real datasets. The results support that TADIP improved the performance of TAMER especially in large sample size datasets with high complexity.ConclusionsTADIP was developed as a statistical model to improve the estimate accuracy of taxonomy assignments. Based on its varying mismatch probability setting and correlated variance matrix setting, its performance was enhanced for high complexity samples when compared with TAMER.

Highlights

  • With the advances in the next-generation sequencing technologies, researchers can rapidly examine the composition of samples from humans and their surroundings

  • Our study has demonstrated that TADIP performed better for high complexity samples, because TADIP takes into account the correlations among different genomes with different mismatch probabilities

  • TADIP is developed as a statistical model to improve the estimate accuracy of taxonomy assignments

Read more

Summary

Introduction

With the advances in the next-generation sequencing technologies, researchers can rapidly examine the composition of samples from humans and their surroundings. To analyze these short reads, Basic Local Alignment Search Tool (BLAST) is often used to identify regions of similarity between nucleotide or protein sequences by comparing sequence reads from one sample to sequences in reference databases It assesses the BLAST can potentially lead to inaccurate estimates when errors occur in taxonomy assignment in the context of metagenomic analysis [3,4,5]. MEGAN assigns matched reads to the least common ancestor in the taxonomy tree when there are multiple matches to different genomes [2], and because it assigns short reads to one genome with the best match and ignores relevant biological information with weak statistical significance, MEGAN can lead to false findings To address this issue, Jiang and colleagues [2] introduced TAMER, which assigns metagenomic sequence reads with a mixture model by estimating the probability for each read generated

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call