Adaptive Genomic Features of Raoultella ornithinolytica LAM1 from the Geothermal Site of Los Azufres Reveal Potential for Heavy-Metal Bioremediation.
Raoultella ornithinolytica strain LAM1, a facultative anaerobe isolated from a metal-rich geothermal pond in Los Azufres, Mexico, grew in sodium arsenate concentrations up to 1500ppm (7.215mM/L). Whole-genome sequencing yielded a 6.01-Mbp draft genome across 104 contigs, encoding 5744 predicted genes annotated using Prokka and NCBI PGAP. Among these, 99 genes were associated with resistance to arsenic, mercury, copper, zinc, cobalt, cadmium, nickel, and lead. We identified a complete ars operon (arsR-arsB-arsC-arsA-arsD) with three arsC paralogs; mer operon genes (four merA copies, merR); zntA-zntR; copA-cueO-cusA; and the metal homeostasis system nikABCDE-nikR. Functional classification assigned 27.9% of resistance genes to zinc, 20.9% to nickel, 18.6% to copper, and 16.3% to arsenic. Cluster of orthologous genes (COG) annotation revealed enrichment in ABC-type permeases (COG0601/1173/0444), metal ion efflux systems (COG1566), and redox enzymes. Growth on CHROMagar™ ESBL medium indicated β-lactamase activity. Comparative analysis with 508 publicly available R. ornithinolytica genomes confirmed conservation of core resistance operons and identified adaptations in sulfur metabolism (dsrB). These results support R. ornithinolytica strain LAM1's survival in metal-contaminated geothermal environments and indicate potential for bioremediation applications.
3
- 10.3390/insects14050431
- Apr 30, 2023
- Insects
6
- Aug 1, 1974
- ASHA monographs
7
- 10.1039/d4cc03722g
- Jan 1, 2024
- Chemical communications (Cambridge, England)
711
- 10.1093/nar/gkt1252
- Dec 3, 2013
- Nucleic Acids Research
67
- 10.1111/j.1348-0421.1979.tb00490.x
- Jun 1, 1979
- Microbiology and immunology
56
- 10.1016/j.apgeochem.2004.07.006
- Nov 16, 2004
- Applied Geochemistry
61
- 10.3389/fmicb.2023.1229828
- Jul 24, 2023
- Frontiers in Microbiology
274
- 10.1016/j.envadv.2022.100168
- Jan 11, 2022
- Environmental Advances
4
- Nov 1, 1967
- Folia medica. Folia medica (Naples, Italy)
244
- 10.1046/j.1365-2958.1999.01229.x
- Feb 1, 1999
- Molecular Microbiology
- Research Article
150
- 10.1186/1745-6150-7-46
- Jan 1, 2012
- Biology Direct
BackgroundCollections of Clusters of Orthologous Genes (COGs) provide indispensable tools for comparative genomic analysis, evolutionary reconstruction and functional annotation of new genomes. Initially, COGs were made for all complete genomes of cellular life forms that were available at the time. However, with the accumulation of thousands of complete genomes, construction of a comprehensive COG set has become extremely computationally demanding and prone to error propagation, necessitating the switch to taxon-specific COG collections. Previously, we reported the collection of COGs for 41 genomes of Archaea (arCOGs). Here we present a major update of the arCOGs and describe evolutionary reconstructions to reveal general trends in the evolution of Archaea.ResultsThe updated version of the arCOG database incorporates 91% of the pangenome of 120 archaea (251,032 protein-coding genes altogether) into 10,335 arCOGs. Using this new set of arCOGs, we performed maximum likelihood reconstruction of the genome content of archaeal ancestral forms and gene gain and loss events in archaeal evolution. This reconstruction shows that the last Common Ancestor of the extant Archaea was an organism of greater complexity than most of the extant archaea, probably with over 2,500 protein-coding genes. The subsequent evolution of almost all archaeal lineages was apparently dominated by gene loss resulting in genome streamlining. Overall, in the evolution of Archaea as well as a representative set of bacteria that was similarly analyzed for comparison, gene losses are estimated to outnumber gene gains at least 4 to 1. Analysis of specific patterns of gene gain in Archaea shows that, although some groups, in particular Halobacteria, acquire substantially more genes than others, on the whole, gene exchange between major groups of Archaea appears to be largely random, with no major ‘highways’ of horizontal gene transfer.ConclusionsThe updated collection of arCOGs is expected to become a key resource for comparative genomics, evolutionary reconstruction and functional annotation of new archaeal genomes. Given that, in spite of the major increase in the number of genomes, the conserved core of archaeal genes appears to be stabilizing, the major evolutionary trends revealed here have a chance to stand the test of time.ReviewersThis article was reviewed by (for complete reviews see the Reviewers’ Reports section): Dr. PLG, Prof. PF, Dr. PL (nominated by Prof. JPG).
- Research Article
291
- 10.1186/1743-422x-6-223
- Dec 1, 2009
- Virology Journal
BackgroundThe Nucleo-Cytoplasmic Large DNA Viruses (NCLDV) comprise an apparently monophyletic class of viruses that infect a broad variety of eukaryotic hosts. Recent progress in isolation of new viruses and genome sequencing resulted in a substantial expansion of the NCLDV diversity, resulting in additional opportunities for comparative genomic analysis, and a demand for a comprehensive classification of viral genes.ResultsA comprehensive comparison of the protein sequences encoded in the genomes of 45 NCLDV belonging to 6 families was performed in order to delineate cluster of orthologous viral genes. Using previously developed computational methods for orthology identification, 1445 Nucleo-Cytoplasmic Virus Orthologous Groups (NCVOGs) were identified of which 177 are represented in more than one NCLDV family. The NCVOGs were manually curated and annotated and can be used as a computational platform for functional annotation and evolutionary analysis of new NCLDV genomes. A maximum-likelihood reconstruction of the NCLDV evolution yielded a set of 47 conserved genes that were probably present in the genome of the common ancestor of this class of eukaryotic viruses. This reconstructed ancestral gene set is robust to the parameters of the reconstruction procedure and so is likely to accurately reflect the gene core of the ancestral NCLDV, indicating that this virus encoded a complex machinery of replication, expression and morphogenesis that made it relatively independent from host cell functions.ConclusionsThe NCVOGs are a flexible and expandable platform for genome analysis and functional annotation of newly characterized NCLDV. Evolutionary reconstructions employing NCVOGs point to complex ancestral viruses.
- Research Article
698
- 10.1093/nar/gkaa1018
- Nov 9, 2020
- Nucleic acids research
The Clusters of Orthologous Genes (COG) database, also referred to as the Clusters of Orthologous Groups of proteins, was created in 1997 and went through several rounds of updates, most recently, in 2014. The current update, available at https://www.ncbi.nlm.nih.gov/research/COG, substantially expands the scope of the database to include complete genomes of 1187 bacteria and 122 archaea, typically, with a single genome per genus. In addition, the current version of the COGs includes the following new features: (i) the recently deprecated NCBI's gene index (gi) numbers for the encoded proteins are replaced with stable RefSeq or GenBank\ENA\DDBJ coding sequence (CDS) accession numbers; (ii) COG annotations are updated for >200 newly characterized protein families with corresponding references and PDB links, where available; (iii) lists of COGs grouped by pathways and functional systems are added; (iv) 266 new COGs for proteins involved in CRISPR-Cas immunity, sporulation in Firmicutes and photosynthesis in cyanobacteria are included; and (v) the database is made available as a web page, in addition to FTP. The current release includes 4877 COGs. Future plans include further expansion of the COG collection by adding archaeal COGs (arCOGs), splitting the COGs containing multiple paralogs, and continued refinement of COG annotations.
- Research Article
39
- 10.3390/ijms19092525
- Aug 25, 2018
- International Journal of Molecular Sciences
Morchella is a popular edible fungus worldwide due to its rich nutrition and unique flavor. Many research efforts were made on the domestication and cultivation of Morchella all over the world. In recent years, the cultivation of Morchella was successfully commercialized in China. However, the biology is not well understood, which restricts the further development of the morel fungus cultivation industry. In this paper, we performed de novo sequencing and assembly of the genomes of two monospores with a different mating type (M04M24 and M04M26) isolated from the commercially cultivated strain M04. Gene annotation and comparative genome analysis were performed to study differences in CAZyme (Carbohydrate-active enzyme) enzyme content, transcription factors, duplicated sequences, structure of mating type sites, and differences at the gene and functional levels between the two monospore strains of M. importuna. Results showed that the de novo assembled haploid M04M24 and M04M26 genomes were 48.98 and 51.07 Mb, respectively. A complete fine physical map of M. importuna was obtained from genome coverage and gene completeness evaluation. A total of 10,852 and 10,902 common genes and 667 and 868 endemic genes were identified from the two monospore strains, respectively. The Gene Ontology (GO) and KAAS (KEGG Automatic Annotation Serve) enrichment analyses showed that the endemic genes performed different functions. The two monospore strains had 99.22% collinearity with each other, accompanied with certain position and rearrangement events. Analysis of complete mating-type loci revealed that the two monospore M. importuna strains contained an independent mating-type structure and remained conserved in sequence and location. The phylogenetic and divergence time of M. importuna was analyzed at the whole-genome level for the first time. The bifurcation time of morel and tuber was estimated to be 201.14 million years ago (Mya); the two monospore strains with a different mating type represented the evolution of different nuclei, and the single copy homologous genes between them were also different due to a genetic differentiation distance about 0.65 Mya. Compared with truffles, M. importuna had an extension of 28 clusters of orthologous genes (COGs) and a contraction of two COGs. The two different polar nuclei with different degrees of contraction and expansion suggested that they might have undergone different evolutionary processes. The different mating-type structures, together with the functional clustering and enrichment analysis results of the endemic genes of the two different polar nuclei, imply that M. importuna might be a heterothallic fungus and the interaction between the endemic genes may be necessary for its complete life history. Studies on the genome of M. importuna facilitate a better understanding of morel biology and evolution.
- Research Article
241
- 10.1093/bib/bbx117
- Sep 14, 2017
- Briefings in Bioinformatics
For the past 20 years, the Clusters of Orthologous Genes (COG) database had been a popular tool for microbial genome annotation and comparative genomics. Initially created for the purpose of evolutionary classification of protein families, the COG have been used, apart from straightforward functional annotation of sequenced genomes, for such tasks as (i) unification of genome annotation in groups of related organisms; (ii) identification of missing and/or undetected genes in complete microbial genomes; (iii) analysis of genomic neighborhoods, in many cases allowing prediction of novel functional systems; (iv) analysis of metabolic pathways and prediction of alternative forms of enzymes; (v) comparison of organisms by COG functional categories; and (vi) prioritization of targets for structural and functional characterization. Here we review the principles of the COG approach and discuss its key advantages and drawbacks in microbial genome analysis.
- Research Article
25
- 10.1128/jb.00058-21
- May 7, 2021
- Journal of Bacteriology
Ribosomal proteins (RPs) are highly conserved across the bacterial and archaeal domains. Although many RPs are essential for survival, genome analysis demonstrates the absence of some RP genes in many bacterial and archaeal genomes. Furthermore, global transposon mutagenesis and/or targeted deletion showed that elimination of some RP genes had only a moderate effect on the bacterial growth rate. Here, we systematically analyze the evolutionary conservation of RPs in prokaryotes by compiling the list of the ribosomal genes that are missing from one or more genomes in the recently updated version of the Clusters of Orthologous Genes (COG) database. Some of these absences occurred because the respective genes carried frameshifts, presumably, resulting from sequencing errors, while others were overlooked and not translated during genome annotation. Apart from these annotation errors, we identified multiple genuine losses of RP genes in a variety of bacteria and archaea. Some of these losses are clade-specific, whereas others occur in symbionts and parasites with dramatically reduced genomes. The lists of computationally and experimentally defined non-essential ribosomal genes show a substantial overlap, revealing a common trend in prokaryote ribosome evolution that could be linked to the architecture and assembly of the ribosomes. Thus, RPs that are located at the surface of the ribosome and/or are incorporated at a late stage of ribosome assembly are more likely to be non-essential and to be lost during microbial evolution, particularly, in the course of genome compaction.IMPORTANCEIn many prokaryote genomes, one or more ribosomal protein (RP) genes are missing. Analysis of 1,309 prokaryote genomes included in the COG database shows that only about half of the RPs are universally conserved in bacteria and archaea. In contrast, up to 16 other RPs are missing in some genomes, primarily, tiny (<1 Mb) genomes of host-associated bacteria and archaea. Ten universal and nine archaea-specific ribosomal proteins show clear patterns of lineage-specific gene loss. Most of the RPs that are frequently lost from bacterial genomes are located on the ribosome periphery and are non-essential in Escherichia coli and Bacillus subtilis These results reveal general trends and common constraints in the architecture and evolution of ribosomes in prokaryotes.
- Research Article
6
- 10.1371/journal.pone.0264374
- Mar 9, 2022
- PLOS ONE
Acinetobacter baumannii is an opportunistic gram-negative bacteria typically attributed to hospital-associated infection. It could also become multidrug-resistant (MDR), extensively drug-resistant (XDR), and pan drug-resistant (PDR) during a short period. Although A. baumannii has been documented extensively, complete knowledge on the antibiotic-resistant mechanisms and virulence factors responsible for pathogenesis has not been entirely elucidated. This study investigated the drug resistance pattern and characterized the genomic sequence by de novo assembly of PDR A. baumannii strain VJR422, which was isolated from a catheter-sputum specimen. The results showed that the VJR422 strain was resistant to any existing antibiotics. Based on de novo assembly, whole-genome sequences showed a total genome size of 3,924,675-bp. In silico and conventional MLST analysis of sequence type (ST) of this strain was new ST by Oxford MLST scheme and designated as ST1890. Moreover, we found 10,915 genes that could be classified into 45 categories by Gene Ontology (GO) analysis. There were 1,687 genes mapped to 34 Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. The statistics from Clusters of Orthologous Genes (COG) annotation identified 3,189 genes of the VJR422 strain. Regarding the existence of virulence factors, a total of 59 virulence factors were identified in the genome of the VJR422 strain by virulence factors of pathogenic bacteria databases (VFDB). The drug-resistant genes were investigated by searching in the Comprehensive Antibiotic Resistance Database (CARD). The strain harbored antibiotic-resistant genes responsible for aminoglycoside, β-lactam-ring-containing drugs, erythromycin, and streptogramin resistance. We also identified resistance-nodulation-cell division (RND) and the major facilitator superfamily (MFS) associated with the antibiotic efflux pump. Overall, this study focused on A. baumannii strain VJR422 at the genomic level data, i.e., GO, COG, and KEGG. The antibiotic-resistant genotype and phenotype as well as the presence of potential virulence associated factors were investigated.
- Research Article
242
- 10.3390/life5010818
- Mar 10, 2015
- Life
With the continuously accelerating genome sequencing from diverse groups of archaea and bacteria, accurate identification of gene orthology and availability of readily expandable clusters of orthologous genes are essential for the functional annotation of new genomes. We report an update of the collection of archaeal Clusters of Orthologous Genes (arCOGs) to cover, on average, 91% of the protein-coding genes in 168 archaeal genomes. The new arCOGs were constructed using refined algorithms for orthology identification combined with extensive manual curation, including incorporation of the results of several completed and ongoing research projects in archaeal genomics. A new level of classification is introduced, superclusters that unit two or more arCOGs and more completely reflect gene family evolution than individual, disconnected arCOGs. Assessment of the current archaeal genome annotation in public databases indicates that consistent use of arCOGs can significantly improve the annotation quality. In addition to their utility for genome annotation, arCOGs also are a platform for phylogenomic analysis. We explore this aspect of arCOGs by performing a phylogenomic study of the Thermococci that are traditionally viewed as the basal branch of the Euryarchaeota. The results of phylogenomic analysis that involved both comparison of multiple phylogenetic trees and a search for putative derived shared characters by using phyletic patterns extracted from the arCOGs reveal a likely evolutionary relationship between the Thermococci, Methanococci, and Methanobacteria. The arCOGs are expected to be instrumental for a comprehensive phylogenomic study of the archaea.
- Research Article
39
- 10.1038/s41438-021-00501-6
- Mar 10, 2021
- Horticulture Research
Dragon fruits are tropical fruits economically important for agricultural industries. As members of the family of Cactaceae, they have evolved to adapt to the arid environment. Here we report the draft genome of Hylocereus undatus, commercially known as the white-fleshed dragon fruit. The chromosomal level genome assembly contains 11 longest scaffolds corresponding to the 11 chromosomes of H. undatus. Genome annotation of H. undatus found ~29,000 protein-coding genes, similar to Carnegiea gigantea (saguaro). Whole-genome duplication (WGD) analysis revealed a WGD event in the last common ancestor of Cactaceae followed by extensive genome rearrangements. The divergence time between H. undatus and C. gigantea was estimated to be 9.18 MYA. Functional enrichment analysis of orthologous gene clusters (OGCs) in six Cactaceae plants found significantly enriched OGCs in drought resistance. Fruit flavor-related functions were overrepresented in OGCs that are significantly expanded in H. undatus. The H. undatus draft genome also enabled the discovery of carbohydrate and plant cell wall-related functional enrichment in dragon fruits treated with trypsin for a longer storage time. Lastly, genes of the betacyanin (a red-violet pigment and antioxidant with a very high concentration in dragon fruits) biosynthetic pathway were found to be co-localized on a 12 Mb region of one chromosome. The consequence may be a higher efficiency of betacyanin biosynthesis, which will need experimental validation in the future. The H. undatus draft genome will be a great resource to study various cactus plants.
- Research Article
- 10.3390/pathogens14020128
- Feb 1, 2025
- Pathogens (Basel, Switzerland)
Salmonella enterica serovar Enteritidis (S. Enteritidis) is one of the most common causes of bacterial foodborne infections worldwide. It has an extensive host range, including birds and humans, making it one of the most adaptable Salmonella serovars. This study aims to define the virulence gene profile of S. Enteritidis and identify genes critical to its host specificity. Currently, there is limited understanding of the molecular mechanisms that allow S. Enteritidis to continue as an important foodborne pathogen. To better understand the genes that may play a role in the host-specific virulence and/or fitness of S. Enteritidis, we first compiled a virulence gene profile-based genome analysis of sequenced S. Enteritidis strains isolated from shell eggs in our laboratory. This analysis was subsequently used to compare the representative genomes of Salmonella serovars with varying host ranges and S. Enteritidis genomes. The study involved a comprehensive and direct examination of the conservation of virulence and/or fitness factors, especially in a host-specific manner-an area that has not been previously explored. Key findings include the identification of 10 virulence-associated clusters of orthologous genes (COGs) specific to poultry-colonizing serovars and 12 virulence-associated COGs unique to human-colonizing serovars. Virulence/fitness-associated gene analysis identified more than 600 genes. The genome sequences of the two S. Enteritidis isolates were compared to those of the other serovars. Genome analysis revealed a core of 2817 COGs that were common to all the Salmonella serovars examined. Comparative genome analysis revealed that 10 virulence-associated COGs were specific to poultry-colonizing serovars, whereas 12 virulence-associated COGs were present in all human-colonizing serovars. Phylogenetic analyses further highlight the evolution of host specificity in S. Enteritidis. This study offers the first comprehensive analysis of genes that may be unique to and possibly essential for the colonization and/or pathogenesis of S. Enteritidis in various and specific hosts.
- Research Article
105
- 10.1101/gr.161901
- Feb 8, 2001
- Genome Research
Gene order in prokaryotes is conserved to a much lesser extent than protein sequences. Only several operons, primarily those that code for physically interacting proteins, are conserved in all or most of the bacterial and archaeal genomes. Nevertheless, even the limited conservation of operon organization that is observed can provide valuable evolutionary and functional clues through multiple genome comparisons. A program for constructing gapped local alignments of conserved gene strings in two genomes was developed. The statistical significance of the local alignments was assessed using Monte Carlo simulations. Sets of local alignments were generated for all pairs of completely sequenced bacterial and archaeal genomes, and for each genome a template-anchored multiple alignment was constructed. In most pairwise genome comparisons, <10% of the genes in each genome belonged to conserved gene strings. When closely related pairs of species (i.e., two mycoplasmas) are excluded, the total coverage of genomes by conserved gene strings ranged from <5% for the cyanobacterium Synechocystis sp to 24% for the minimal genome of Mycoplasma genitalium, and 23% in Thermotoga maritima. The coverage of the archaeal genomes was only slightly lower than that of bacterial genomes. The majority of the conserved gene strings are known operons, with the ribosomal superoperon being the top-scoring string in most genome comparisons. However, in some of the bacterial–archaeal pairs, the superoperon is rearranged to the extent that other operons, primarily those subject to horizontal transfer, show the greatest level of conservation, such as the archaeal-type H+-ATPase operon or ABC-type transport cassettes. The level of gene order conservation among prokaryotic genomes was compared to the cooccurrence of genomes in clusters of orthologous genes (COGs) and to the conservation of protein sequences themselves. Only limited correlation was observed between these evolutionary variables. Gene order conservation shows a much lower variance than the cooccurrence of genomes in COGs, which indicates that intragenome homogenization via recombination occurs in evolution much faster than intergenome homogenization via horizontal gene transfer and lineage-specific gene loss. The potential of using template-anchored multiple-genome alignments for predicting functions of uncharacterized genes was quantitatively assessed. Functions were predicted or significantly clarified for ∼90 COGs (∼4% of the total of 2414 analyzed COGs). The most significant predictions were obtained for the poorly characterized archaeal genomes; these include a previously uncharacterized restriction-modification system, a nuclease-helicase combination implicated in DNA repair, and the probable archaeal counterpart of the eukaryotic exosome. Multiple genome alignments are a resource for studies on operon rearrangement and disruption, which is central to our understanding of the evolution of prokaryotic genomes. Because of the rapid evolution of the gene order, the potential of genome alignment for prediction of gene functions is limited, but nevertheless, such predictions information significantly complements the results obtained through protein sequence and structure analysis.
- Research Article
331
- 10.1101/gr.gr-1619r
- Feb 8, 2001
- Genome Research
Gene order in prokaryotes is conserved to a much lesser extent than protein sequences. Only several operons, primarily those that code for physically interacting proteins, are conserved in all or most of the bacterial and archaeal genomes. Nevertheless, even the limited conservation of operon organization that is observed can provide valuable evolutionary and functional clues through multiple genome comparisons. A program for constructing gapped local alignments of conserved gene strings in two genomes was developed. The statistical significance of the local alignments was assessed using Monte Carlo simulations. Sets of local alignments were generated for all pairs of completely sequenced bacterial and archaeal genomes, and for each genome a template-anchored multiple alignment was constructed. In most pairwise genome comparisons, <10% of the genes in each genome belonged to conserved gene strings. When closely related pairs of species (i.e., two mycoplasmas) are excluded, the total coverage of genomes by conserved gene strings ranged from <5% for the cyanobacterium Synechocystis sp to 24% for the minimal genome of Mycoplasma genitalium, and 23% in Thermotoga maritima. The coverage of the archaeal genomes was only slightly lower than that of bacterial genomes. The majority of the conserved gene strings are known operons, with the ribosomal superoperon being the top-scoring string in most genome comparisons. However, in some of the bacterial-archaeal pairs, the superoperon is rearranged to the extent that other operons, primarily those subject to horizontal transfer, show the greatest level of conservation, such as the archaeal-type H+-ATPase operon or ABC-type transport cassettes. The level of gene order conservation among prokaryotic genomes was compared to the cooccurrence of genomes in clusters of orthologous genes (COGs) and to the conservation of protein sequences themselves. Only limited correlation was observed between these evolutionary variables. Gene order conservation shows a much lower variance than the cooccurrence of genomes in COGs, which indicates that intragenome homogenization via recombination occurs in evolution much faster than intergenome homogenization via horizontal gene transfer and lineage-specific gene loss. The potential of using template-anchored multiple-genome alignments for predicting functions of uncharacterized genes was quantitatively assessed. Functions were predicted or significantly clarified for approximately 90 COGs (approximately 4% of the total of 2414 analyzed COGs). The most significant predictions were obtained for the poorly characterized archaeal genomes; these include a previously uncharacterized restriction-modification system, a nuclease-helicase combination implicated in DNA repair, and the probable archaeal counterpart of the eukaryotic exosome. Multiple genome alignments are a resource for studies on operon rearrangement and disruption, which is central to our understanding of the evolution of prokaryotic genomes. Because of the rapid evolution of the gene order, the potential of genome alignment for prediction of gene functions is limited, but nevertheless, such predictions information significantly complements the results obtained through protein sequence and structure analysis.
- Research Article
17
- 10.1093/nar/gkae983
- Nov 4, 2024
- Nucleic acids research
The Clusters of Orthologous Genes (COG) database, originally created in 1997, has been updated to reflect the constantly growing collection of completely sequenced prokaryotic genomes. This update increased the genome coverage from 1309 to 2296 species, including 2103 bacteria and 193 archaea, in most cases, with a single representative genome per genus. This set covers all genera of bacteria and archaea that included organisms with 'complete genomes' as per NCBI databases in November 2023. The number of COGs has been expanded from 4877 to 4981, primarily by including protein families involved in bacterial protein secretion. Accordingly, COG pathways and functional groups now include secretion systems of types II through X, as well as Flp/Tad and type IV pili. These groupings allow straightforward identification and examination of the prokaryotic lineages that encompass-or lack-a particular secretion system. Other developments include improved annotations for the rRNA and tRNA modification proteins, multi-domain signal transduction proteins, and some previously uncharacterized protein families. The new version of COGs is available at https://www.ncbi.nlm.nih.gov/research/COG, as well as on the NCBI FTP site https://ftp.ncbi.nlm.nih.gov/pub/COG/, which also provides archived data from previous COG releases.
- Research Article
- 10.3389/fmicb.2025.1553679
- Jul 4, 2025
- Frontiers in microbiology
The family Bacillaceae is phenotypically and phylogenetically heterogeneous group of bacteria, which has vast metabolic capability in carbohydrates degradation and secondary metabolite production. Deep marine sediments harbor highly diverse microorganisms, playing important roles in ecosystem. Here, we investigated the cultivable fraction of bacteria associated with the sediments of South China Sea (n = 152). After obtaining candidate novel strains, the morphological and physiological characteristics analysis were conducted for polyphasic taxonomy. Additionally, the whole genome sequencing, annotation and comparative genomic analysis were performed for their specific metabolic characteristics. As a result, seven novel members of the family Bacillaceae have been established: Pseudalkalibacillus nanhaiensis sp. nov. (Strain SCS-8T), Paraperibacillus marinus sp. nov. (Strain SCS-26T), Neobacillus oceani sp. nov. (Strain SCS-31T), Paraperibacillus esterisolvens sp. nov. (Strain SCS-37T), Nanhaiella sioensis gen. nov., sp. nov. (Strain SCS-151T), Rossellomorea sedimentorum sp. nov. (Strain SCS-153AT) and Peribacillus sedimenti sp. nov. (Strain SCS-155T). These novel srains display smaller genome sizes and distinctive characteristics. The annotation of Cluster of Orthologous Genes (COG) revealed a higher specific gene abundance in these strains in the carbohydrate transport and metabolism (COG-G), secondary metabolites processes (COG-Q), and the cell membrane-related functions (COG-M). These Bacillaceae species isolated from sediment have different capability to degrade carbohydrates and produce biosynthetic products compared to other reference strains, revealing that they have unique adaptation strategies to the deep marine sediments.
- Research Article
170
- 10.1186/1745-6150-2-33
- Nov 27, 2007
- Biology Direct
BackgroundAn evolutionary classification of genes from sequenced genomes that distinguishes between orthologs and paralogs is indispensable for genome annotation and evolutionary reconstruction. Shortly after multiple genome sequences of bacteria, archaea, and unicellular eukaryotes became available, an attempt on such a classification was implemented in Clusters of Orthologous Groups of proteins (COGs). Rapid accumulation of genome sequences creates opportunities for refining COGs but also represents a challenge because of error amplification. One of the practical strategies involves construction of refined COGs for phylogenetically compact subsets of genomes.ResultsNew Archaeal Clusters of Orthologous Genes (arCOGs) were constructed for 41 archaeal genomes (13 Crenarchaeota, 27 Euryarchaeota and one Nanoarchaeon) using an improved procedure that employs a similarity tree between smaller, group-specific clusters, semi-automatically partitions orthology domains in multidomain proteins, and uses profile searches for identification of remote orthologs. The annotation of arCOGs is a consensus between three assignments based on the COGs, the CDD database, and the annotations of homologs in the NR database. The 7538 arCOGs, on average, cover ~88% of the genes in a genome compared to a ~76% coverage in COGs. The finer granularity of ortholog identification in the arCOGs is apparent from the fact that 4538 arCOGs correspond to 2362 COGs; ~40% of the arCOGs are new. The archaeal gene core (protein-coding genes found in all 41 genome) consists of 166 arCOGs. The arCOGs were used to reconstruct gene loss and gene gain events during archaeal evolution and gene sets of ancestral forms. The Last Archaeal Common Ancestor (LACA) is conservatively estimated to possess 996 genes compared to 1245 and 1335 genes for the last common ancestors of Crenarchaeota and Euryarchaeota, respectively. It is inferred that LACA was a chemoautotrophic hyperthermophile that, in addition to the core archaeal functions, encoded more idiosyncratic systems, e.g., the CASS systems of antivirus defense and some toxin-antitoxin systems.ConclusionThe arCOGs provide a convenient, flexible framework for functional annotation of archaeal genomes, comparative genomics and evolutionary reconstructions. Genomic reconstructions suggest that the last common ancestor of archaea might have been (nearly) as advanced as the modern archaeal hyperthermophiles. ArCOGs and related information are available at: .ReviewersThis article was reviewed by Peer Bork, Patrick Forterre, and Purificacion Lopez-Garcia.
- New
- Research Article
- 10.1007/s00284-025-04596-1
- Nov 8, 2025
- Current microbiology
- New
- Research Article
- 10.1007/s00284-025-04600-8
- Nov 8, 2025
- Current microbiology
- New
- Research Article
- 10.1007/s00284-025-04589-0
- Nov 8, 2025
- Current microbiology
- New
- Research Article
- 10.1007/s00284-025-04586-3
- Nov 8, 2025
- Current microbiology
- New
- Research Article
- 10.1007/s00284-025-04595-2
- Nov 8, 2025
- Current microbiology
- New
- Research Article
- 10.1007/s00284-025-04584-5
- Nov 5, 2025
- Current microbiology
- New
- Research Article
- 10.1007/s00284-025-04585-4
- Nov 5, 2025
- Current microbiology
- New
- Research Article
- 10.1007/s00284-025-04574-7
- Nov 5, 2025
- Current microbiology
- New
- Research Article
- 10.1007/s00284-025-04588-1
- Nov 4, 2025
- Current microbiology
- New
- Research Article
- 10.1007/s00284-025-04583-6
- Nov 4, 2025
- Current microbiology
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.