PanACEA: a bioinformatics tool for the exploration and visualization of bacterial pan-chromosomes

Thomas H Clarke,Jason M Inman,Granger Sutton,Derrick E Fouts,Lauren M Brinkac

doi:10.1186/s12859-018-2250-y

Thomas H Clarke, Jason M Inman + Show 3 more

Open Access

https://doi.org/10.1186/s12859-018-2250-y

Copy DOI

Abstract

BackgroundBacterial pan-genomes, comprised of conserved and variable genes across multiple sequenced bacterial genomes, allow for identification of genomic regions that are phylogenetically discriminating or functionally important. Pan-genomes consist of large amounts of data, which can restrict researchers ability to locate and analyze these regions. Multiple software packages are available to visualize pan-genomes, but currently their ability to address these concerns are limited by using only pre-computed data sets, prioritizing core over variable gene clusters, or by not accounting for pan-chromosome positioning in the viewer.ResultsWe introduce PanACEA (Pan-genome Atlas with Chromosome Explorer and Analyzer), which utilizes locally-computed interactive web-pages to view ordered pan-genome data. It consists of multi-tiered, hierarchical display pages that extend from pan-chromosomes to both core and variable regions to single genes. Regions and genes are functionally annotated to allow for rapid searching and visual identification of regions of interest with the option that user-supplied genomic phylogenies and metadata can be incorporated. PanACEA’s memory and time requirements are within the capacities of standard laptops. The capability of PanACEA as a research tool is demonstrated by highlighting a variable region important in differentiating strains of Enterobacter hormaechei.ConclusionsPanACEA can rapidly translate the results of pan-chromosome programs into an intuitive and interactive visual representation. It will empower researchers to visually explore and identify regions of the pan-chromosome that are most biologically interesting, and to obtain publication quality images of these regions.

Highlights

Bacterial pan-genomes, comprised of conserved and variable genes across multiple sequenced bacterial genomes, allow for identification of genomic regions that are phylogenetically discriminating or functionally important
None of the existing pan-genome visualization tools are geared toward a standalone, intuitive, pan-chromosome-based interactive browser that will enable researchers to navigate to those parts of the pan-genome that are most relevant to understanding strain-specific differences that may impact pathogenesis, antimicrobial resistance, and general fitness in a given environment
Data input PanACEA uses PERL scripts and a tab-delimited human-readable flat file that contains the following necessary information for the script to generate platform-independent visualizations: the gene order of the pan-chromosome “assemblies”, including the flexible and core regions; detailed information about each gene; and the location of the sequences of the genes. Though this file can be recreated ad hoc and the user manual does provide descriptions, the PanACEA software package includes a script designed to translate the output of pan-genome software packages to the PanACEA flat file (Fig. 1)

Summary

Introduction

Bacterial pan-genomes, comprised of conserved and variable genes across multiple sequenced bacterial genomes, allow for identification of genomic regions that are phylogenetically discriminating or functionally important. Multiple software packages are available to visualize pan-genomes, but currently their ability to address these concerns are limited by using only pre-computed data sets, prioritizing core over variable gene clusters, or by not accounting for pan-chromosome positioning in the viewer. A comparison of just six strains of Streptococcus agalactiae demonstrated that many more isolates are needed to capture strain diversity and helped define the concept of the bacterial pan-genome: the set of genes (core and variable) that are encoded within a bacterial species [1]. Tools have been developed to perform multiple genome comparisons by computing orthologous gene clusters and the resulting sets of core. None of the existing pan-genome visualization tools are geared toward a standalone (i.e., client side), intuitive, pan-chromosome-based interactive browser that will enable researchers to navigate to those parts of the pan-genome that are most relevant to understanding strain-specific differences that may impact pathogenesis, antimicrobial resistance, and general fitness in a given environment

Results

Discussion

Conclusion