Abstract
Horizontal gene transfer (HGT) is a major source of variability in prokaryotic genomes. Regions of genome plasticity (RGPs) are clusters of genes located in highly variable genomic regions. Most of them arise from HGT and correspond to genomic islands (GIs). The study of those regions at the species level has become increasingly difficult with the data deluge of genomes. To date, no methods are available to identify GIs using hundreds of genomes to explore their diversity. We present here the panRGP method that predicts RGPs using pangenome graphs made of all available genomes for a given species. It allows the study of thousands of genomes in order to access the diversity of RGPs and to predict spots of insertions. It gave the best predictions when benchmarked along other GI detection tools against a reference dataset. In addition, we illustrated its use on metagenome assembled genomes by redefining the borders of the leuX tRNA hotspot, a well-studied spot of insertion in Escherichia coli. panRPG is a scalable and reliable tool to predict GIs and spots making it an ideal approach for large comparative studies. The methods presented in the current work are available through the following software: https://github.com/labgem/PPanGGOLiN. Detailed results and scripts to compute the benchmark metrics are available at https://github.com/axbazin/panrgp_supdata.
Highlights
Horizontal gene transfer (HGT) is a major mechanism that shapes gene repertoires of bacterial species providing and maintaining diversity at the population level (Ochman et al, 2000; Niehus et al, 2015)
The methods of panRGP for the detection of Regions of Genome Plasticity (RGPs) and spots have been implemented in the PPanGGOLiN pangenomic software suite available through Github under the CeCiLL 2.1 open source license
To illustrate the potential of panRGP on Metagenome Assembled Genomes (MAGs), we studied the genomic context of a previously described hotspot in E. coli (Lescat et al, 2009) using a pangenome constructed from MAG sequences from a recently published metagenome dataset (Pasolli et al, 2019)
Summary
Horizontal gene transfer (HGT) is a major mechanism that shapes gene repertoires of bacterial species providing and maintaining diversity at the population level (Ochman et al, 2000; Niehus et al, 2015). The panRGP method predicts RGPs from a query genome that is annotated with a set of protein-coding genes It uses as input a partitioned pangenome graph that is built from the genomes of related organisms usually from the same species. This graph is based on the PPanGGOLiN data structure (Gautreau et al, 2020) where nodes are homologous gene families and edges indicate a relation of genetic contiguity. RGPs from different genomes can be grouped in spots of insertion based on their conserved flanking persistent genes using the pangenome graph as explained in subsection 2.1.3
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.