Abstract

Improvements in DNA sequencing technology have increased the amount and quality of sequences that can be obtained from metagenomic samples, making it practical to extract individual microbial genomes from metagenomic assemblies (“binning”). However, while many tools and methods exist for unsupervised binning with various statistical algorithms, there are few options for visualizing the results, even though visualization is vital to exploratory data analysis. We have developed gbtools, a software package that allows users to visualize metagenomic assemblies by plotting coverage (sequencing depth) and GC values of contigs, and also to annotate the plots with taxonomic information. Different sets of annotations, including taxonomic assignments from conserved marker genes or SSU rRNA genes, can be imported simultaneously; users can choose which annotations to plot. Bins can be manually defined from plots, or be imported from third-party binning tools and overlaid onto plots, such that results from different methods can be compared side-by-side. gbtools reports summary statistics of bins including marker gene completeness, and allows the user to add or subtract bins with each other. We illustrate some of the functions available in gbtools with two examples: the metagenome of Olavius algarvensis, a marine oligochaete worm that has up to five bacterial symbionts, and the metagenome of a synthetic mock community comprising 64 bacterial and archaeal strains. We show how instances of poor automated binning, sequencer GC% bias, and variation between samples can be quickly diagnosed by visualization, and demonstrate how the results from different binning tools can be combined and refined to yield manually curated bins with higher completeness. gbtools is open-source and written in R. The software package, documentation, and example data are available freely online at https://github.com/kbseah/genome-bin-tools.

Highlights

  • Metagenomics originated in the field of microbial ecology as a means to look into the function of whole communities, given that most environmental microbes are resistant to cultivation (Handelsman, 2004; Kunin et al, 2008; Teeling and Glockner, 2012)

  • Visualization is usually the first step in data exploration, and despite the sophistication of many of the current methods for unsupervised binning, it remains an important part of the metagenomics toolkit

  • Each cluster represents a single putative genome bin. Such a heuristic approach was used by Albertsen et al (2013) to extract 12 nearly complete genomes of uncultivated bacteria from an activated sludge community, with the aid of principal-components analysis of tetranucleotide frequencies and additional taxonomic information from marker genes overlaid on the plots

Read more

Summary

Introduction

Metagenomics originated in the field of microbial ecology as a means to look into the function of whole communities, given that most environmental microbes are resistant to cultivation (Handelsman, 2004; Kunin et al, 2008; Teeling and Glockner, 2012). Metagenome Visualization samples (Tyson et al, 2004; Woyke et al, 2006), recent advances in high-throughput sequencing have vastly increased the sequencing depth that can be obtained with the same resources, and this has made it practical to bin individual genomes from increasingly diverse communities. Each cluster represents a single putative genome bin Such a heuristic approach was used by Albertsen et al (2013) to extract 12 nearly complete genomes of uncultivated bacteria from an activated sludge community, with the aid of principal-components analysis of tetranucleotide frequencies and additional taxonomic information from marker genes overlaid on the plots. Visualization is useful post hoc, to spot potential artifacts from imperfect binning, and to verify or troubleshoot automated methods

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call