Abstract
In 1996, a set of 20 trinucleotides was identified in genes of both prokaryotes and eukaryotes which has on average the highest occurrence in reading frame compared to its two shifted frames. Furthermore, this set has an interesting mathematical property as is a maximal self-complementary trinucleotide circular code. In 2015, by quantifying the inspection approach used in 1996, the circular code was confirmed in the genes of bacteria and eukaryotes and was also identified in the genes of plasmids and viruses. The method was based on the preferential occurrence of trinucleotides among the three frames at the gene population level. We extend here this definition at the gene level. This new statistical approach considers all the genes, i.e., of large and small lengths, with the same weight for searching the circular code . As a consequence, the concept of circular code, in particular the reading frame retrieval, is directly associated to each gene. At the gene level, the circular code is strengthened in the genes of bacteria, eukaryotes, plasmids, and viruses, and is now also identified in the genes of archaea. The genes of mitochondria and chloroplasts contain a subset of the circular code . Finally, by studying viral genes, the circular code was found in DNA genomes, RNA genomes, double-stranded genomes, and single-stranded genomes.
Highlights
Circular code is a mathematical structure of genes and genomes
We recall a few definitions without detailed explanations for understanding the main properties of the trinucleotide circular code X identified in genes [1,2]
The method developed in [1] for identifying the circular code X in genes determined the preferential frame of trinucleotides at the gene population level, i.e., after summing the trinucleotide frequencies of all genes in a kingdom. We extend this method at the gene level, i.e., the preferential frame of trinucleotides among the three frames is determined for each gene
Summary
Circular code is a mathematical structure of genes and genomes. This concept initially found for genes is extended for genomes (non-coding regions of eukaryotes) according to recent results.A circular code X is a set of words such that any motif from X, called X motif, allows it to retrieve, maintain, and synchronize the original (construction) frame.The circular code X identified in the genes of bacteria, eukaryotes, plasmids, and viruses [1,2]contains the 20 following trinucleotidesX = {AAC, AAT, ACC, ATC, ATT, CAG, CTC, CTG, GAA, GAC,GAG, GAT, GCC, GGC, GGT, GTA, GTC, GTT, TAC, TTC} (1)which allows it to both retrieve the reading frame with a window of 13 nucleotides (Figure 3 in [3])and to code the 12 following amino acids{Ala, Asn, Asp, Gln, Glu, Gly, Ile, Leu, Phe, Thr, Tyr, Val}. (2)The current genetic code is not circular. Circular code is a mathematical structure of genes and genomes. A circular code X is a set of words such that any motif from X, called X motif, allows it to retrieve, maintain, and synchronize the original (construction) frame. The circular code X identified in the genes of bacteria, eukaryotes, plasmids, and viruses [1,2]. To code the 12 following amino acids. The current genetic code is not circular. The loss during evolution of this circular code property on the 4-letter alphabet {A, C, G, T} required a complex translation mechanism using 20 amino acids and proteins in current genomes
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.