Abstract

BackgroundMetagenomic sequencing allows us to study the structure, diversity and ecology in microbial communities without the necessity of obtaining pure cultures. In many metagenomics studies, the reads obtained from metagenomics sequencing are first assembled into longer contigs and these contigs are then binned into clusters of contigs where contigs in a cluster are expected to come from the same species. As different species may share common sequences in their genomes, one assembled contig may belong to multiple species. However, existing tools for binning contigs only support non-overlapped binning, i.e., each contig is assigned to at most one bin (species).ResultsIn this paper, we introduce GraphBin2 which refines the binning results obtained from existing tools and, more importantly, is able to assign contigs to multiple bins. GraphBin2 uses the connectivity and coverage information from assembly graphs to adjust existing binning results on contigs and to infer contigs shared by multiple species. Experimental results on both simulated and real datasets demonstrate that GraphBin2 not only improves binning results of existing tools but also supports to assign contigs to multiple bins.ConclusionGraphBin2 incorporates the coverage information into the assembly graph to refine the binning results obtained from existing binning tools. GraphBin2 also enables the detection of contigs that may belong to multiple species. We show that GraphBin2 outperforms its predecessor GraphBin on both simulated and real datasets. GraphBin2 is freely available at https://github.com/Vini2/GraphBin2.

Highlights

  • Metagenomic sequencing allows us to study the structure, diversity and ecology in microbial communities without the necessity of obtaining pure cultures

  • GraphBin2 uses an improved label propagation algorithm that takes into consideration the distance and coverage of neighbouring contigs, compared to the label propagation algorithm used in GraphBin

  • The results showed that GraphBin2 achieves the best binning performance in both simulated and real datasets

Read more

Summary

Introduction

Metagenomic sequencing allows us to study the structure, diversity and ecology in microbial communities without the necessity of obtaining pure cultures. The reads obtained from metagenomics sequencing are first assembled into longer contigs and these contigs are binned into clusters of contigs where contigs in a cluster are expected to come from the same species. To characterise the composition of a sample, we cluster metagenomic sequences into bins that represent different taxonomic groups such as species, genera or higher levels [3]. This process is known as metagenomics binning. Existing metagenomic contig-binning tools can be divided into two categories. These two categories are (1) reference-based binning and (2) reference-free binning. Reference-free binning tools use Mallawaarachchi et al Algorithms Mol Biol (2021) 16:3

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call