Abstract

BackgroundIdentifying the set of intrinsically conserved genes, or the genomic core, among related genomes is crucial for understanding prokaryotic genomes where horizontal gene transfers are common. Although core genome identification appears to be obvious among very closely related genomes, it becomes more difficult when more distantly related genomes are compared. Here, we consider the core structure as a set of sufficiently long segments in which gene orders are conserved so that they are likely to have been inherited mainly through vertical transfer, and developed a method for identifying the core structure by finding the order of pre-identified orthologous groups (OGs) that maximally retains the conserved gene orders.ResultsThe method was applied to genome comparisons of two well-characterized families, Bacillaceae and Enterobacteriaceae, and identified their core structures comprising 1438 and 2125 OGs, respectively. The core sets contained most of the essential genes and their related genes, which were primarily included in the intersection of the two core sets comprising around 700 OGs. The definition of the genomic core based on gene order conservation was demonstrated to be more robust than the simpler approach based only on gene conservation. We also investigated the core structures in terms of G+C content homogeneity and phylogenetic congruence, and found that the core genes primarily exhibited the expected characteristic, i.e., being indigenous and sharing the same history, more than the non-core genes.ConclusionThe results demonstrate that our strategy of genome alignment based on gene order conservation can provide an effective approach to identify the genomic core among moderately related microbial genomes.

Highlights

  • Identifying the set of intrinsically conserved genes, or the genomic core, among related genomes is crucial for understanding prokaryotic genomes where horizontal gene transfers are common

  • The genes constituting a prokaryotic genome appear to be divided into two classes: a "core gene pool" that comprises intrinsic genes encoding the proteins of basic cellular functions, and a "flexible gene pool" that comprises horizontal gene transfers (HGT)-acquired genes encoding proteins which function under particular conditions, such as

  • We compiled a set of orthologous group (OG) using the DomClust algorithm [24] on the Microbial Genome Database for Comparative Analysis (MBGD) server [14], and considered an OG as "conserved" when it was present in at least half of the genomes

Read more

Summary

Introduction

Identifying the set of intrinsically conserved genes, or the genomic core, among related genomes is crucial for understanding prokaryotic genomes where horizontal gene transfers are common. The term "core genome" has been used in various contexts, in the context of intraspecific comparisons, "core genome" is typically defined as a set of genes shared by all strains, while "pan-genome" is defined as the union of genes contained in all the strains considered [16,17,18] This definition of "core genome" can be applied to genus-level comparisons [19], and similar types of analyses have been conducted for comparisons of even more distantly related genomes [20,21]. In a strict sense, "genuine ortholog" is only meaningful when the genes have been transmitted vertically, and in that sense, "core genome" and "genuine ortholog" are closely related concepts

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.