Construction of Dinucleotide Circular Codes Based on Nucleotide Probabilities.
The construction of a circular code through a biological process, particularly a primitive one in the absence of the protein world, has remained an open problem since the discovery of a maximal [Formula: see text] self-complementary trinucleotide circular code in genes in 1996 (Arquès and Michel, 1996). Circular codes are defined by their ability to recover the correct reading frame of genes at any position. While a class of 216 such trinucleotide codes has been identified, the KL method (Koch and Lehman, 1997), based on nucleotide probability products, generates only a restricted subclass of 88 [Formula: see text]-codes (Lacan and Michel, 2001). Revisiting this probabilistic framework 25 years later, we demonstrate that various classes of dinucleotide circular codes can be generated using a nucleotide probability product model (called Construction 2). We introduce the concept of transitive dinucleotide codes and prove new theorems characterizing their circularity and comma-free properties. Using codon usage from bacteria, archaea, and eukaryotes, 2 "universal" maximal dinucleotide circular codes are observed: [Formula: see text] in the codon site [Formula: see text] and [Formula: see text] in the codon site [Formula: see text] which can be deduced from [Formula: see text] by 1-letter cyclical permutation [Formula: see text] or identically by reversing permutation [Formula: see text]. Unexpectedly, we then show that, under the independence assumption, the dinucleotide code [Formula: see text] through Construction 2 from nucleotide frequencies in the codon sites 1 and 2, is a maximal dinucleotide circular code and is equal to the observed dinucleotide code: [Formula: see text]. These findings support a theoretical model in which dinucleotide circular codes may have originated from statistical properties of primitive nucleotide distributions, providing insights into the possible emergence of the genetic code.
- Research Article
1
- 10.1016/j.biosystems.2024.105308
- Aug 17, 2024
- BioSystems
Circular code identified by the codon usage
- Research Article
1
- 10.1016/j.biosystems.2024.105263
- Jul 4, 2024
- BioSystems
Circular cut codes in genetic information
- Research Article
53
- 10.1016/j.jtbi.2015.04.009
- Apr 29, 2015
- Journal of Theoretical Biology
The maximal C3 self-complementary trinucleotide circular code X in genes of bacteria, eukaryotes, plasmids and viruses
- Research Article
8
- 10.1016/j.mbs.2019.108231
- Jul 17, 2019
- Mathematical Biosciences
Mixed circular codes
- Research Article
3
- 10.1016/j.biosystems.2021.104431
- Apr 22, 2021
- Biosystems
Genes on the circular code alphabet
- Research Article
35
- 10.1016/j.gene.2019.03.069
- Mar 30, 2019
- Gene
Spontaneous evolution of circular codes in theoretical minimal RNA rings
- Abstract
- 10.1016/j.nbt.2018.05.1145
- Sep 22, 2018
- New Biotechnology
The production of methacrylic acid in Escherichia coli
- Research Article
- 10.4081/jsas.2012.59
- Dec 1, 2012
- Journal of the Siena Academy of Sciences
In 1994, with a statistical study of trinucleotide occurences per frame, D. Arquès and C. Michel identified the fol lowing set of 20 trin- ucleotides in the gene population of both eucaryotes E U K and procaryotes P RO: {AAC , AAT , AC C , ATC,ATT,CAG,CTC,CTG,GAA,GAC,GAG,GAT, GCC,GGC,GGT,GTA,GTC,GTT,TAC,TTC}.Thisset of words of length three on the genetic alphabet A4 = {A, C, G, T } has remarkable properties: it is a self-complementary set, it is a circular code, it is maximal and it has the C3- property. This and other identifications of trinucleotide circular codes in different genomes in the last twenty years raised interest in the concept of trinucleotide circular code for genetics. In 2003 we found an efficient algorithm for testing the circularity of a trinucleotide code and in 2005 we found the list of al l 528 maximal self-complementary circular codes. In 2008 we presented a hierarchy of the self-complementary circular codes and in 2012 we presented a hierarchy of al l circular codes. These circular codes could permit the identification, either in paral lel with or substituting existing methods used by biologist, of as yet unknown coding regions of DNA. A hierarchy of tetranucleotide circular codes is one of our aims in the future. In this paper we begin the study of the unbordered tetranucleotides and of the “forbidden configurations” for tetranucleotides and we give a first result.
- Research Article
1
- 10.1016/j.biosystems.2024.105215
- Apr 17, 2024
- BioSystems
Circular code in introns
- Research Article
22
- 10.1016/j.jtbi.2014.03.029
- Mar 31, 2014
- Journal of Theoretical Biology
A genetic scale of reading frame coding
- Research Article
14
- 10.1016/j.mbs.2017.10.001
- Oct 10, 2017
- Mathematical Biosciences
Diletter circular codes over finite alphabets
- Research Article
6
- 10.1016/j.biosystems.2021.104368
- Feb 7, 2021
- Biosystems
Potential role of the X circular code in the regulation of gene expression
- Research Article
38
- 10.3390/life7020020
- Apr 18, 2017
- Life
In 1996, a set of 20 trinucleotides was identified in genes of both prokaryotes and eukaryotes which has on average the highest occurrence in reading frame compared to its two shifted frames. Furthermore, this set has an interesting mathematical property as is a maximal self-complementary trinucleotide circular code. In 2015, by quantifying the inspection approach used in 1996, the circular code was confirmed in the genes of bacteria and eukaryotes and was also identified in the genes of plasmids and viruses. The method was based on the preferential occurrence of trinucleotides among the three frames at the gene population level. We extend here this definition at the gene level. This new statistical approach considers all the genes, i.e., of large and small lengths, with the same weight for searching the circular code . As a consequence, the concept of circular code, in particular the reading frame retrieval, is directly associated to each gene. At the gene level, the circular code is strengthened in the genes of bacteria, eukaryotes, plasmids, and viruses, and is now also identified in the genes of archaea. The genes of mitochondria and chloroplasts contain a subset of the circular code . Finally, by studying viral genes, the circular code was found in DNA genomes, RNA genomes, double-stranded genomes, and single-stranded genomes.
- Research Article
40
- 10.1016/j.jtbi.2011.01.028
- Jan 26, 2011
- Journal of Theoretical Biology
Circular codes revisited: A statistical approach
- Research Article
8
- 10.1016/j.jtbi.2015.08.029
- Sep 14, 2015
- Journal of Theoretical Biology
Maximal dinucleotide and trinucleotide circular codes
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.