Abstract

MotivationSequencing technologies allow the sequencing of microbial communities directly from the environment without prior culturing. Because assembly typically produces only genome fragments, also known as contigs, it is crucial to group them into putative species for further taxonomic profiling and down-streaming functional analysis. Taxonomic analysis of microbial communities requires contig clustering, a process referred to as binning, that is still one of the most challenging tasks when analyzing metagenomic data. The major problems are the lack of taxonomically related genomes in existing reference databases, the uneven abundance ratio of species, sequencing errors, and the limitations due to binning contig of different lengths.ResultsIn this context we present MetaCon a novel tool for unsupervised metagenomic contig binning based on probabilistic k-mers statistics and coverage. MetaCon uses a signature based on k-mers statistics that accounts for the different probability of appearance of a k-mer in different species, also contigs of different length are clustered in two separate phases. The effectiveness of MetaCon is demonstrated in both simulated and real datasets in comparison with state-of-art binning approaches such as CONCOCT, MaxBin and MetaBAT.

Highlights

  • Studies in microbial ecology commonly experience a bottleneck effect due to difficulties in microbial isolation and cultivation [1]

  • In this context we present MetaCon a novel tool for unsupervised metagenomic contig binning based on probabilistic k-mers statistics and coverage

  • MetaCon uses a signature based on k-mers statistics that accounts for the different probability of appearance of a k-mer in different species, contigs of different length are clustered in two separate phases

Read more

Summary

Introduction

Studies in microbial ecology commonly experience a bottleneck effect due to difficulties in microbial isolation and cultivation [1]. Due to the difficulty in culturing most organisms in a laboratory, alternative methods to analyze microbial diversity are commonly used to study community structure and functionality. One such method is the sequencing of the collective genomes (metagenomics) of all microorganisms in an environment [2]. Metagenomics is a study of the heterogeneous microbes samples (e.g. soil, water, human microbiome) directly extracted from the natural environment with the primary goal of determining the taxonomical identity of the microorganisms residing in the samples. It. To further investigate the taxonomic structure of microbial samples, assembled sequence fragments, known as contigs, need be grouped into bin that represent genomes. Accurate binning of the contigs is an essential problem in metagenomic studies

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call