Abstract

The self-complementary subset T0=X0∪{AAA, TTT} with X0={AAC, AAT, ACC, ATC, ATT, CAG, CTC, CTG, GAA, GAC, GAG, GAT, GCC, GGC, GGT, GTA, GTC, GTT, TAC, TTC} of 22 trinucleotides has a preferential occurrence in the frame 0 (reading frame established by the ATG start trinucleotide) of protein (coding) genes of both prokaryotes and eukaryotes. The subsets T1=X1∪{CCC} and T2=X2∪{GGG} of 21 trinucleotides have a preferential occurrence in the shifted frames 1 and 2 respectively (frame 0 shifted by one and two nucleotides respectively in the 5′–3′ direction). T1and T2are complementary to each other. The subset T0contains the subset X0which has the rarity property (6×10−8) to be a complementary maximal circular code with two permutated maximal circular codes X1and X2in the frames 1 and 2 respectively. X0is called a C3code.A quantitative study of these three subsets T0, T1, T2in the three frames 0, 1, 2 of protein genes, and the 5′ and 3′ regions of eukaryotes, shows that their occurrence frequencies are constant functions of the trinucleotide positions in the sequences. The frequencies of T0, T1, T2in the frame 0 of protein genes are 49, 28.5 and 22.5% respectively. In contrast, the frequencies of T0, T1, T2in the 5′ and 3′ regions of eukaryotes, are independent of the frame. Indeed, the frequency of T0in the three frames of 5′ (respectively 3′) regions is equal to 35.5% (respectively 38%) and is greater than the frequencies T1and T2, both equal to 32.25% (respectively 31%) in the three frames.Several frequency asymmetries unexpectedly observed (e.g. the frequency difference between T1and T2in the frame 0), are related to a new property of the subset T0involving substitutions. An evolutionary analytical model at three parameters (p,q,t) based on an independent mixing of the 22 codons (trinucleotides in frame 0) of T0with equiprobability (1/22) followed byt≈ 4 substitutions per codon according to the proportionsp≈ 0.1,q≈ 0.1 andr= 1−p−q≈ 0.8 in the three codon sites respectively, retrieves the frequencies of T0, T1, T2observed in the three frames of protein genes and explains these asymmetries. Furthermore, the same model (0.1, 0.1,t) aftert≈ 22 substitutions per codon, retrieves the statistical properties observed in the three frames of the 5′ and 3′ regions. The complex behaviour of these analytical curves is totally unexpected anda prioridifficult to imagine.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.