Abstract

Recent advances in ultra-high-throughput sequencing technology and metagenomics have led to a paradigm shift in microbial genomics from few genome comparisons to large-scale pan-genome studies at different scales of phylogenetic resolution. Pan-genome studies provide a framework for estimating the genomic diversity of the dataset, determining core (conserved), accessory (dispensable) and unique (strain-specific) gene pool of a species, tracing horizontal gene-flux across strains and providing insight into species evolution. The existing pan genome software tools suffer from various limitations like limited datasets, difficult installation/requirements, inadequate functional features etc. Here we present an ultra-fast computational pipeline BPGA (Bacterial Pan Genome Analysis tool) with seven functional modules. In addition to the routine pan genome analyses, BPGA introduces a number of novel features for downstream analyses like core/pan/MLST (Multi Locus Sequence Typing) phylogeny, exclusive presence/absence of genes in specific strains, subset analysis, atypical G + C content analysis and KEGG & COG mapping of core, accessory and unique genes. Other notable features include minimum running prerequisites, freedom to select the gene clustering method, ultra-fast execution, user friendly command line interface and high-quality graphics outputs. The performance of BPGA has been evaluated using a dataset of complete genome sequences of 28 Streptococcus pyogenes strains.

Highlights

  • A number of free tools and web servers are available for pan genome analysis, but each of them suffers from one or the other limitations, leaving rooms for further improvement

  • An option for applying the tools to a subset of the total dataset may facilitate identification of exclusive genetic features that can discriminate between different serological, ecological or pathogenic groups. In this context; we have developed an ultra-fast computational pipeline BPGA (Bacterial Pan Genome Analysis Tool) with seven functional modules for comprehensive pan genome studies and downstream analyses

  • In addition to all types of analyses offered by currently available tools, this pipeline contains certain novel features like Exclusive Gene Family Analysis, KEGG Pathway Analysis, GC Content Analysis, Subset Analysis etc

Read more

Summary

Introduction

A number of free tools and web servers are available for pan genome analysis, but each of them suffers from one or the other limitations, leaving rooms for further improvement. There has been, a pressing need for development a new computational pipeline, which will offer fast and efficient formalisms for construction of the pan genome through clustering of orthologous gene families and and enable various downstream analyses such as mapping of the core, accessory & unique genes to various COG categories and/or KEGG pathways, phylogenetic analysis, in silico multi locus sequence typing (MLST) and other relevant analyses. An option for applying the tools to a subset of the total dataset may facilitate identification of exclusive genetic features that can discriminate between different serological, ecological or pathogenic groups. In this context; we have developed an ultra-fast computational pipeline BPGA (Bacterial Pan Genome Analysis Tool) with seven functional modules for comprehensive pan genome studies and downstream analyses. Other notable features of BPGA includes minimum running prerequisites, ease of handling, user friendly command line interface, freedom for user to select method for clustering, high quality image output and efficiency in terms of time cost

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.