PIRATE: A fast and scalable pangenomics toolbox for clustering diverged orthologues in bacteria.

Sion C Bayliss,Edward J Feil,Samuel K Sheppard,Nicola M Coyle,Harry A Thorpe

doi:10.1093/gigascience/giz119

Sion C Bayliss, Edward J Feil + Show 3 more

Open Access

https://doi.org/10.1093/gigascience/giz119

Copy DOI

Journal: GigaScience	Publication Date: Oct 1, 2019
Citations: 164	License type: CC BY 4.0

Affiliation: University of Bath

Abstract

BackgroundCataloguing the distribution of genes within natural bacterial populations is essential for understanding evolutionary processes and the genetic basis of adaptation. Advances in whole genome sequencing technologies have led to a vast expansion in the amount of bacterial genomes deposited in public databases. There is a pressing need for software solutions which are able to cluster, catalogue and characterise genes, or other features, in increasingly large genomic datasets.ResultsHere we present a pangenomics toolbox, PIRATE (Pangenome Iterative Refinement and Threshold Evaluation), which identifies and classifies orthologous gene families in bacterial pangenomes over a wide range of sequence similarity thresholds. PIRATE builds upon recent scalable software developments to allow for the rapid interrogation of thousands of isolates. PIRATE clusters genes (or other annotated features) over a wide range of amino acid or nucleotide identity thresholds and uses the clustering information to rapidly identify paralogous gene families and putative fission/fusion events. Furthermore, PIRATE orders the pangenome using a directed graph, provides a measure of allelic variation, and estimates sequence divergence for each gene family.ConclusionsWe demonstrate that PIRATE scales linearly with both number of samples and computation resources, allowing for analysis of large genomic datasets, and compares favorably to other popular tools. PIRATE provides a robust framework for analysing bacterial pangenomes, from largely clonal to panmictic species.

Highlights

For most bacteria the complement of genes for a given species is far greater than the number of genes in any one strain
Differences in methodology lie primarily in the post processing of clusters, Roary uses a single percentage identity threshold for MCL clustering and separates paralogs based upon their neighboring genes and PanX splits paralogous genes using an alignment/tree-based method rather than the CDHIT-BLAST approach used by Pangenome Iterative Refinement And Threshold Evaluation (PIRATE)
We present PIRATE, a toolbox for pangenomic analysis of bacterial genomes, which provides a framework for exploring gene diversity by defining genes using relaxed sequence similarity thresholds

Summary

Introduction

For most bacteria the complement of genes for a given species is far greater than the number of genes in any one strain. Current approaches define genes on the basis of strict sequence identity thresholds [2,3,7,8], e-value cutoffs [5,6] and bit score ratios [4]. It is difficult to define a single identity threshold beyond which genes cease to belong to the same family. Over-splitting is likely to be especially problematic in vertically acquired core genes that have undergone strong diversifying selection or horizontally acquired accessory genes from multiple source populations which share a distant common ancestor. This can lead to misleading impressions of pangenome size and composition

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

PIRATE: A fast and scalable pangenomics toolbox for clustering diverged orthologues in bacteria.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: GigaScience

Lead the way for us

Similar Papers

RAIChU: automating the visualisation of natural product biosynthesis
Barbara R Terlouw ... Marnix H Medema
Journal of Cheminformatics | VOL. 16
Barbara R Terlouw, et. al.Barbara R Terlouw ... Marnix H Medema
03 Sep 2024
Journal of Cheminformatics | VOL. 16

Genetic basis of speciation and adaptation: from loci to causative mutations.
Jun Kitano ... Mark Ravinet
Philosophical Transactions of the Royal Society B: Biological Sciences | VOL. 377
Jun Kitano, et. al.Jun Kitano ... Mark Ravinet
30 May 2022
Philosophical Transactions of the Royal Society B: Biological Sciences | VOL. 377

Steroid receptor-associated and regulated protein is a biomarker in predicting the clinical outcome and treatment response in malignancies.
Ali Naderi
Cancer Reports | VOL. 3
Ali NaderiAli Naderi
24 Jul 2020
Cancer Reports | VOL. 3

HGSuite HyperBrowser: A web-based toolkit for hierarchical metadata-informed analysis of genomic tracks.
Sumana Kalyanasundaram ... Hilde Loge Nilsen
PLOS ONE | VOL. 18
Sumana Kalyanasundaram, et. al.Sumana Kalyanasundaram ... Hilde Loge Nilsen
19 Jul 2023
PLOS ONE | VOL. 18

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

PIRATE: A fast and scalable pangenomics toolbox for clustering diverged orthologues in bacteria.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: GigaScience