Detection of Dispersed Repeats in the Genomes of Bacteria from Different Phyla
In this study, we searched for dispersed repeats in the genomes of bacteria from 42 phyla using the iterative procedure method. The results revealed that each genome contained a family of repeats with lengths from 440 to 580 bases and copy numbers from 1.0 × 103 to 1.4 × 104, depending on the species. The detected repeats occupied from 17 to 72% of the bacterial genome, and more than 90% of them were superimposed as motifs on the coding sequences. The repeats contained conserved islands interspersed with weakly similar regions. Consensus sequences calculated for all the found repeats appeared to significantly differ among the bacteria. We hypothesize that the detected repeat families may be involved in the formation of the bacterial nucleoid.
- Research Article
5
- 10.1186/1471-2105-14-118
- Apr 4, 2013
- BMC Bioinformatics
BackgroundPyrrolysine (the 22nd amino acid) is in certain organisms and under certain circumstances encoded by the amber stop codon, UAG. The circumstances driving pyrrolysine translation are not well understood. The involvement of a predicted mRNA structure in the region downstream UAG has been suggested, but the structure does not seem to be present in all pyrrolysine incorporating genes.ResultsWe propose a strategy to predict pyrrolysine encoding genes in genomes of archaea and bacteria. We cluster open reading frames interrupted by the amber codon based on sequence similarity. We rank these clusters according to several features that may influence pyrrolysine translation. The ranking effects of different features are assessed and we propose a weighted combination of these features which best explains the currently known pyrrolysine incorporating genes. We devote special attention to the effect of structural conservation and provide further substantiation to support that structural conservation may be influential – but is not a necessary factor. Finally, from the weighted ranking, we identify a number of potentially pyrrolysine incorporating genes.ConclusionsWe propose a method for prediction of pyrrolysine incorporating genes in genomes of bacteria and archaea leading to insights about the factors driving pyrrolysine translation and identification of new gene candidates. The method predicts known conserved genes with high recall and predicts several other promising candidates for experimental verification. The method is implemented as a computational pipeline which is available on request.
- Research Article
226
- 10.1371/journal.pbio.1001033
- Mar 22, 2011
- PLoS Biology
Most genomes of bacteria contain toxin–antitoxin (TA) systems. These gene systems encode a toxic protein and its cognate antitoxin. Upon antitoxin degradation, the toxin induces cell stasis or death. TA systems have been linked with numerous functions, including growth modulation, genome maintenance, and stress response. Members of the epsilon/zeta TA family are found throughout the genomes of pathogenic bacteria and were shown not only to stabilize resistance plasmids but also to promote virulence. The broad distribution of epsilon/zeta systems implies that zeta toxins utilize a ubiquitous bacteriotoxic mechanism. However, whereas all other TA families known to date poison macromolecules involved in translation or replication, the target of zeta toxins remained inscrutable. We used in vivo techniques such as microscropy and permeability assays to show that pneumococcal zeta toxin PezT impairs cell wall synthesis and triggers autolysis in Escherichia coli. Subsequently, we demonstrated in vitro that zeta toxins in general phosphorylate the ubiquitous peptidoglycan precursor uridine diphosphate-N-acetylglucosamine (UNAG) and that this activity is counteracted by binding of antitoxin. After identification of the product we verified the kinase activity in vivo by analyzing metabolite extracts of cells poisoned by PezT using high pressure liquid chromatograpy (HPLC). We further show that phosphorylated UNAG inhibitis MurA, the enzyme catalyzing the initial step in bacterial peptidoglycan biosynthesis. Additionally, we provide what is to our knowledge the first crystal structure of a zeta toxin bound to its substrate. We show that zeta toxins are novel kinases that poison bacteria through global inhibition of peptidoglycan synthesis. This provides a fundamental understanding of how epsilon/zeta TA systems stabilize mobile genetic elements. Additionally, our results imply a mechanism that connects activity of zeta toxin PezT to virulence of pneumococcal infections. Finally, we discuss how phosphorylated UNAG likely poisons additional pathways of bacterial cell wall synthesis, making it an attractive lead compound for development of new antibiotics.
- Research Article
7
- 10.2174/187152606778249917
- Sep 1, 2006
- Infectious Disorders - Drug Targets
Signature-tagged mutagenesis (STM) provided the first widely applicable high-throughput method for detecting conditionally essential genes in bacteria by using negative selection to screen large pools of transposon (Tn) mutants. STM requires no prior knowledge of the bacterium's genome sequence, and has been used to study a large number of Gram-positive and Gram-negative species, greatly expanding the repertoires of known virulence factors for these organisms. Originally, hybridization of radiolabelled probes to colony or dot blots was used to detect differences in populations of tagged mutants before and after growth under a selective condition. Modifications of the tag detection method involving polymerase chain reaction (PCR) amplification and visualisation by gel electrophoresis have been developed and can be automated through the use of robotics. Genetic footprinting is another negative selection technique that uses PCR amplification to detect loss of mutants from a pool. Unlike PCR-STM, this technique allows direct amplification of Tn-flanking sequences. However, it requires the bacterium's whole genome sequence in order to design specific primers for every gene of interest. More recently, a number of techniques have been described that combine the negative-selection principle of STM and genetic footprinting with the genome-wide screening power of DNA microarrays. These techniques, although also requiring whole genome sequences, use either a form of linker-mediated or semi-random PCR to amplify and label Tn-flanking regions for hybridization to microarrays. The superior sensitivity microarray detection allows greater numbers of mutants to be screened per pool, as well as determination of the coverage/distribution of insertions in the library prior to screening, two significant advantages over STM.
- Research Article
31
- 10.1099/vir.0.064915-0
- Apr 11, 2014
- Journal of General Virology
Despite tremendous efforts to improve the methodology for constructing flavivirus infectious cDNAs, the manipulation of flavivirus cDNAs remains a difficult task in bacteria. Here, we successfully propagated DNA-launched type 2 dengue virus (DENV2) and Japanese encephalitis virus (JEV) infectious cDNAs by introducing seven repeats of the tetracycline-response element (7×TRE) and a minimal cytomegalovirus (CMVmin) promoter upstream of the viral genome. Insertion of the 7×TRE-CMVmin sequence upstream of the DENV2 or JEV genome decreased the cryptic E. coli promoter (ECP) activity of the viral genome in bacteria, as measured using fusion constructs containing DENV2 or JEV segments and the reporter gene Renilla luciferase in an empty vector. The growth kinetics of recombinant viruses derived from DNA-launched DENV2 and JEV infectious cDNAs were similar to those of parental viruses. Similarly, RNA-launched DENV2 infectious cDNAs were generated by inserting 7×TRE-CMVmin, five repeats of the GAL4 upstream activating sequence, or five repeats of BamHI linkers upstream of the DENV2 genome. All three tandem repeat sequences decreased the ECP activity of the DENV2 genome in bacteria. Notably, 7×TRE-CMVmin stabilized RNA-launched JEV infectious cDNAs and reduced the ECP activity of the JEV genome in bacteria. The growth kinetics of recombinant viruses derived from RNA-launched DENV2 and JEV infectious cDNAs displayed patterns similar to those of the parental viruses. These results support a novel methodology for constructing flavivirus infectious cDNAs, which will facilitate research in virology, viral pathogenesis and vaccine development of flaviviruses and other RNA viruses.
- Research Article
10
- 10.2174/156652306777592081
- Jun 1, 2006
- Current Gene Therapy
The herpes simplex virus (HSV) amplicon vector is a versatile plasmid-based gene delivery vehicle with a large transgene capacity (up to 150 kb) and the ability to infect a broad range of cell types. The vector system was originally developed by Frenkel and her colleagues in 1980. Ever since, a great deal of effort by various investigators has been directed at minimizing the toxicity associated with the inevitable contamination by helper virus. In 1996, Fraefel and his colleagues successfully devised a cosmid-based packaging system that was free of contamination by helper virus (so-called helper virus-free packaging), which utilized as helper a set of 5 overlapping cosmid clones that covered the entire HSV genome, which lacked the DNA packaging/cleavage signals. With the helper virus-free system, broader applications of the vector became possible. Cloning of the entire HSV genome in bacteria artificial chromosome (BAC) plasmids enabled stable maintenance and propagation of the helper HSV genome in bacteria. It also allowed for the development of BAC-based helper virus-free packaging systems. In this article, we review various versions of DNA-based methods to prepare HSV amplicon vectors free of helper virus contamination. We also examine recent advances in vector design, including methods of vector construction, hybrid amplicon vectors, and the infectious BAC system. Future directions in improving packaging systems and vector designs are discussed.
- Research Article
39
- 10.1103/physrevx.8.021023
- Apr 20, 2018
- Physical Review X
Many complex systems are modular. Such systems can be represented as "component systems", i.e., sets of elementary components, such as LEGO bricks in LEGO sets. The bricks found in a LEGO set reflect a target architecture, which can be built following a set-specific list of instructions. In other component systems, instead, the underlying functional design and constraints are not obvious a priori, and their detection is often a challenge of both scientific and practical importance, requiring a clear understanding of component statistics. Importantly, some quantitative invariants appear to be common to many component systems, most notably a common broad distribution of component abundances, which often resembles the well-known Zipf's law. Such "laws" affect in a general and non-trivial way the component statistics, potentially hindering the identification of system-specific functional constraints or generative processes. Here, we specifically focus on the statistics of shared components, i.e., the distribution of the number of components shared by different system-realizations, such as the common bricks found in different LEGO sets. To account for the effects of component heterogeneity, we consider a simple null model, which builds system-realizations by random draws from a universe of possible components. Under general assumptions on abundance heterogeneity, we provide analytical estimates of component occurrence, which quantify exhaustively the statistics of shared components. Surprisingly, this simple null model can positively explain important features of empirical component-occurrence distributions obtained from data on bacterial genomes, LEGO sets, and book chapters. Specific architectural features and functional constraints can be detected from occurrence patterns as deviations from these null predictions, as we show for the illustrative case of the "core" genome in bacteria.
- Research Article
- 10.1186/s12864-025-11717-4
- May 22, 2025
- BMC Genomics
BackgroundMost bacterial lipases in subfamily I.1/I.2 depend on a specific chaperone protein, lipase-specific foldase (Lif), for folding into their active form. In contrast, several Lif-independent lipases have been reported in subfamily I.1. Lif-independent lipases have the potential to be industrially useful owing to their ease of heterologous expression; however, no method has been reported to predict Lif-dependence for an arbitrary lipase. In this study, we comprehensively estimated the Lif-dependence of subfamily I.1/I.2.ResultsTo estimate Lif-dependence, we comprehensively analyzed the presence or absence of Lif genes in the genomes of bacteria from which the lipases were derived and integrated the results with those of phylogenetic analysis. We identified a range of lipases from the Pseudomonas fragi/Proteus vulgaris clade, which contained all known Lif-independent lipases and were enriched for lipases that did not coexist with Lif. Sequences and structural features conserved in the P. fragi/P. vulgaris clade and other lipases were identified, and the residues involved in Lif-dependence were inferred. Furthermore, we identified the Pseudoalteromonas shioyasakiensis clade, which is phylogenetically distinct from the P. fragi/P. vulgaris clade, as having no Lif in the genome of the bacterium from which the lipase was derived. The P. shioyasakiensis clade lipase, PliLip, was heterologously expressed in Escherichia coli in an active form, independent of Lif.ConclusionsIn this study, we developed a method to predict Lif-dependence in any lipase belonging to subfamily I.1/I.2 and comprehensively extracted putative Lif-independent lipases from public databases. This study contributes to expand the diversity of industrially available Lif-independent lipases and provides fundamental insights into the evolution of lipases and Lif.
- Research Article
3
- 10.1007/s00284-010-9783-2
- Oct 24, 2010
- Current Microbiology
The interaction between bacteria and human is still incomplete. With the recent availability of many microbial genomes and human genome, as well as the series of basic local alignment search tool (BLAST) programs, a new perspective to gain insight into the interaction between the bacteria and human is possible. This study is to determine the possibility of existence of sequence identity between the genomes of bacteria and human, and try to explain this phenomenon in term of bacteriophages and other genetic mobile elements. BLAST searches of the genomes of bacteria, bacteriophages, and plasmids against human genome were performed using the resources of the National Center for Biotechnology Information (NCBI). All studied bacteria contain variable numbers of short regions of sequence identity to the genome of human, which ranged from 27 to 84 nt. They were found at multiple sites within the human genome. The short regions of sequence identity existed between the genomes of bacteria and human, and a hypothesis that viruses, especially bacteriophages, might play a significant role in shaping the genomes of bacterial and human, and contribute to the short regions of sequence identity is developed.
- Research Article
20
- 10.1128/mmbr.00170-22
- May 29, 2024
- Microbiology and molecular biology reviews : MMBR
SUMMARYFunctional genomics is the use of systematic gene perturbation approaches to determine the contributions of genes under conditions of interest. Although functional genomic strategies have been used in bacteria for decades, recent studies have taken advantage of CRISPR (clustered regularly interspaced short palindromic repeats) technologies, such as CRISPRi (CRISPR interference), that are capable of precisely modulating expression of all genes in the genome. Here, we discuss and review the use of CRISPRi and related technologies for bacterial functional genomics. We discuss the strengths and weaknesses of CRISPRi as well as design considerations for CRISPRi genetic screens. We also review examples of how CRISPRi screens have defined relevant genetic targets for medical and industrial applications. Finally, we outline a few of the many possible directions that could be pursued using CRISPR-based functional genomics in bacteria. Our view is that the most exciting screens and discoveries are yet to come.
- Research Article
43
- 10.1098/rstb.2009.0046
- Aug 12, 2009
- Philosophical Transactions of the Royal Society B: Biological Sciences
The genome sequence is an icon of early twenty-first century biology. Genomes of nearly 2000 cellular organisms, and from many thousands of organelles and viruses, are now in the public domain. For biological research in individual species, the genome sequence increasingly provides the common
- Research Article
56
- 10.1128/aem.00964-08
- Aug 29, 2008
- Applied and Environmental Microbiology
The ability to acquire diverse and abundant forms of iron would be expected to confer a survival advantage in the marine environment, where iron is scarce. Marine bacteria are known to use siderophores and inorganic iron, but their ability to use heme, an abundant intracellular iron form, has only been examined preliminarily. Microscilla marina, a cultured relative of a bacterial group frequently found on marine particulates, was used as a model organism to examine heme uptake. Searches of the genome revealed analogs to known heme transport proteins, and reverse transcription-quantitative PCR analysis of these genes showed that they were expressed and upregulated under iron stress and during growth on heme. M. marina was found to take up heme-bound iron and could grow on heme as a sole iron source, supporting the genetic evidence for heme transport. Similar putative heme transport components were identified in the genomes of diverse marine bacteria. These systems were found in the genomes of many bacteria thought to be particle associated but were lacking in known free-living organisms (e.g., Pelagibacter ubique and marine cyanobacteria). This distribution of transporters is consistent with the hydrophobic, light-sensitive nature of heme, suggesting that it is primarily available on phytoplankton or detritus or in nutrient-rich environments.
- Book Chapter
2
- 10.1128/9781555815813.ch16
- Apr 8, 2014
This chapter summarizes the knowledge about the genomes of psychrophilic bacteria, a subclass of the cold-adapted bacteria, with emphasis on the specific selective features relevant to cold adaptation. A detailed analysis of the general features of genomes and proteomes from psychrophilic bacteria is presented. All investigators involved in sequencing the genomes of psychrophilic Bacteria looked for common features which would account for cold-adaptation. The genomes of psychrophilic bacteria also have the counterpart of major chaperonins such as the essential GroES GroEL complex. A remarkable observation poses interesting questions about the role of this complex. In the presence of molecular oxygen (dioxygen), this has the consequence that reactive oxygen species (ROS) are more frequent and stable for a longer time. Membrane fluidity can be increased in two ways: either by incorporating unsaturated fatty acids or by including branched-chain fatty acids in the diglycerides. Photobacterium profundum SS9 was found to exhibit enhanced proportions of both monounsaturated and polyunsaturated fatty acids when grown at a decreased temperature or elevated pressure. Three main features can be observed in the genomes and proteomes of these organisms: a variety of means to cope with ROS, a multiplicity of nucleic acid folding and unfolding devices, and, finally, a bias in the amino acid composition of their proteome.
- Research Article
886
- 10.1046/j.1365-2958.2003.03580.x
- Jun 17, 2003
- Molecular Microbiology
Bacterial genome nucleotide sequences are being completed at a rapid and increasing rate. Integrated virus genomes (prophages) are common in such genomes. Fifty-one of the 82 such genomes published to date carry prophages, and these contain 230 recognizable putative prophages. Prophages can constitute as much as 10-20% of a bacterium's genome and are major contributors to differences between individuals within species. Many of these prophages appear to be defective and are in a state of mutational decay. Prophages, including defective ones, can contribute important biological properties to their bacterial hosts. Therefore, if we are to comprehend bacterial genomes fully, it is essential that we are able to recognize accurately and understand their prophages from nucleotide sequence analysis. Analysis of the evolution of prophages can shed light on the evolution of both bacteriophages and their hosts. Comparison of the Rac prophages in the sequenced genomes of three Escherichia coli strains and the Pnm prophages in two Neisseria meningitidis strains suggests that some prophages can lie in residence for very long times, perhaps millions of years, and that recombination events have occurred between related prophages that reside at different locations in a bacterium's genome. In addition, many genes in defective prophages remain functional, so a significant portion of the temperate bacteriophage gene pool resides in prophages.
- Abstract
- 10.1016/0016-0032(57)90124-2
- Dec 1, 1957
- Journal of the Franklin Institute
List of four-letter combinations valuable for coding purposes
- Research Article
10
- 10.1186/1471-2164-13-s8-s19
- Dec 1, 2012
- BMC Genomics
BackgroundDetecting the borders between coding and non-coding regions is an essential step in the genome annotation. And information entropy measures are useful for describing the signals in genome sequence. However, the accuracies of previous methods of finding borders based on entropy segmentation method still need to be improved.MethodsIn this study, we first applied a new recursive entropic segmentation method on DNA sequences to get preliminary significant cuts. A 22-symbol alphabet is used to capture the differential composition of nucleotide doublets and stop codon patterns along three phases in both DNA strands. This process requires no prior training datasets.ResultsComparing with the previous segmentation methods, the experimental results on three bacteria genomes, Rickettsia prowazekii, Borrelia burgdorferi and E.coli, show that our approach improves the accuracy for finding the borders between coding and non-coding regions in DNA sequences.ConclusionsThis paper presents a new segmentation method in prokaryotes based on Jensen-Rényi divergence with a 22-symbol alphabet. For three bacteria genomes, comparing to A12_JR method, our method raised the accuracy of finding the borders between protein coding and non-coding regions in DNA sequences.