Abstract

Genomes of a given bacterial species can show great variation in gene content and thus systematic analysis of the entire gene repertoire, termed the pan-genome, is important for understanding bacterial intra-species diversity, population genetics, and evolution. Here, we analyzed the pan-genome from 30 completely sequenced strains of the human gastric pathogen Helicobacter pylori belonging to various phylogeographic groups, focusing on 991 accessory (not fully conserved) orthologous groups (OGs). We developed a method to evaluate the mobility of genes within a genome, using the gene order in the syntenically conserved regions as a reference, and classified the 991 accessory OGs into five classes: Core, Stable, Intermediate, Mobile, and Unique. Phylogenetic networks based on the gene content of Core and Stable classes are highly congruent with that created from the concatenated alignment of fully conserved core genes, in contrast to those of Intermediate and Mobile classes, which show quite different topologies. By clustering the accessory OGs on the basis of phylogenetic pattern similarity and chromosomal proximity, we identified 60 co-occurring gene clusters (CGCs). In addition to known genomic islands, including cag pathogenicity island, bacteriophages, and integrating conjugative elements, we identified some novel ones. One island encodes TerY-phosphorylation triad, which includes the eukaryote-type protein kinase/phosphatase gene pair, and components of type VII secretion system. Another one contains a reverse-transcriptase homolog, which may be involved in the defense against phage infection through altruistic suicide. Many of the CGCs contained restriction-modification (RM) genes. Different RM systems sometimes occupied the same (orthologous) locus in the strains. We anticipate that our method will facilitate pan-genome studies in general and help identify novel genomic islands in various bacterial species.

Highlights

  • Advances in DNA sequencing technology allow us to compare tens or even hundreds of genome sequences of related bacteria at once [1]

  • By adding the H. pylori reverse transcriptase (RT) sequence to the phylogenetic analysis of these RT homologs, we found that it is related to retroelements involved in abortive phage infection (Abi), an altruistic suicide of phage-infected cells to prevent secondary infection, which includes abiA and abiK genes in Lactococcus [52] (S4A Fig)

  • Phylogenetic analysis based on gene content is affected by horizontal gene transfer (HGT), which complicate the interpretation

Read more

Summary

Introduction

Advances in DNA sequencing technology allow us to compare tens or even hundreds of genome sequences of related bacteria at once [1]. The sizes of core genome and pan-genome have been successfully used as measures to evaluate intra-species diversity [6,7,8] and several tools have been developed for pan-genome analysis [9,10,11], these simple measures or tools alone are not sufficient to understand how each strain has evolved and how presence/absence of each gene contributes to the phenotypic differences between different strains For these purposes, we need a more detailed and yet systematic approach to investigate the whole repertoire of pan-genome

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.