Abstract

Pangenome analyses facilitate the interpretation of genetic diversity and evolutionary history of a taxon. However, there is an urgent and unmet need to develop new tools for advanced pangenome construction and visualization, especially for metagenomic data. Here, we present an integrated pipeline, named MetaPGN, for construction and graphical visualization of pangenome networks from either microbial genomes or metagenomes. Given either isolated genomes or metagenomic assemblies coupled with a reference genome of the targeted taxon, MetaPGN generates a pangenome in a topological network, consisting of genes (nodes) and gene-gene genomic adjacencies (edges) of which biological information can be easily updated and retrieved. MetaPGN also includes a self-developed Cytoscape plugin for layout of and interaction with the resulting pangenome network, providing an intuitive and interactive interface for full exploration of genetic diversity. We demonstrate the utility of MetaPGN by constructing Escherichia coli pangenome networks from five E. coli pathogenic strains and 760 human gut microbiomes,revealing extensive genetic diversity of E. coli within both isolates and gut microbial populations. With the ability to extract and visualize gene contents and gene-gene physical adjacencies of a specific taxon from large-scale metagenomic data, MetaPGN provides advantages in expanding pangenome analysis to uncultured microbial taxa.

Highlights

  • The concept of the pangenome, defined as the full complement of genes in a clade, was first introduced by Tettelin et al in 2005 [1]

  • We demonstrate the utility of MetaPGN by constructing Escherichia coli pangenome networks from five E. coli pathogenic strains and 760 human gut microbiomes,revealing extensive genetic diversity of E. coli within both isolates and gut microbial populations

  • The MetaPGN pipeline can be divided into two main parts: construction of a pangenome network comprised of representative genes, including gene prediction, gene redundancy elimination, gene type determination, assembly recruitment, pairwise gene adjacency extraction, and pangenome network generation; and visualization of the pangenome network in an organized way, where nodes represent genes and edges indicate gene adjacencies in Cytoscape [17] with a self-developed plugin (Fig. 1, Supplementary Fig. S1, Methods Section)

Read more

Summary

Introduction

The concept of the pangenome, defined as the full complement of genes in a clade, was first introduced by Tettelin et al in 2005 [1]. A number of methods and tools have, to date, been proposed for pangenome analysis on genomic or metagenomic data. Typical pangenome tools such as GET HOMOLOGUES [2] and PGAP [3] mainly focus on analyzing homologous gene families and calculating the core/accessory genes of a given taxon. These tools cannot provide the variations of gene-gene physical relationships. It will be difficult for users to track a homologous region among the input genomes, especially when there is a large number of homologous regions and input genomes

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.