Abstract

Metagenomics has become an integral part of defining microbial diversity in various environments. Many ecosystems have characteristically low biomass and few cultured representatives. Linking potential metabolisms to phylogeny in environmental microorganisms is important for interpreting microbial community functions and the impacts these communities have on geochemical cycles. However, with metagenomic studies there is the computational hurdle of ‘binning’ contigs into phylogenetically related units or putative genomes. Binning methods have been implemented with varying approaches such as k-means clustering, Gaussian mixture models, hierarchical clustering, neural networks, and two-way clustering; however, many of these suffer from biases against low coverage/abundance organisms and closely related taxa/strains. We are introducing a new binning method, BinSanity, that utilizes the clustering algorithm affinity propagation (AP), to cluster assemblies using coverage with compositional based refinement (tetranucleotide frequency and percent GC content) to optimize bins containing multiple source organisms. This separation of composition and coverage based clustering reduces bias for closely related taxa. BinSanity was developed and tested on artificial metagenomes varying in size and complexity. Results indicate that BinSanity has a higher precision, recall, and Adjusted Rand Index compared to five commonly implemented methods. When tested on a previously published environmental metagenome, BinSanity generated high completion and low redundancy bins corresponding with the published metagenome-assembled genomes.

Highlights

  • Studies in microbial ecology commonly experience a bottleneck effect due to difficulties in microbial isolation and cultivation (Staley & Konopka, 1985)

  • Metagenomics can elucidate genomic potential, providing information on pathways, metabolism, and taxonomy allowing for inferences about environmental context without cultivation (Meyer et al, How to cite this article Graham et al (2017), BinSanity: unsupervised clustering of environmental microbial assemblies using coverage and affinity propagation

  • The results of this study find that BinSanity can generate high-quality genomes from metagenomics datasets via an automated process, which will enhance our ability to understand complex microbial communities

Read more

Summary

Introduction

Studies in microbial ecology commonly experience a bottleneck effect due to difficulties in microbial isolation and cultivation (Staley & Konopka, 1985). Due to the difficulty in culturing most organisms in a laboratory setting, alternative methods to analyze microbial diversity are commonly used to elucidate community structure and putative functionality. One such method is the sequencing of the collective genomes (metagenomics) of all microorganisms in an environment (Handelsman et al, 1998). Metagenomics can elucidate genomic potential, providing information on pathways, metabolism, and taxonomy allowing for inferences about environmental context without cultivation (Meyer et al., How to cite this article Graham et al (2017), BinSanity: unsupervised clustering of environmental microbial assemblies using coverage and affinity propagation. One of a few issues are encountered in current binning protocols, including: decreasing accuracy for contigs below a size threshold, necessity of human intervention in distinguishing clusters, struggling to differentiate related microorganisms, or excluding low coverage and low abundance organisms (Alneberg et al, 2014; Bowers et al, 2015; Imelfort et al, 2014)

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.