Abstract
The similarity patterns of the genetic code result from similar codons encoding similar messages. We develop a new mathematical model to analyze these patterns. The physicochemical characteristics of amino acids objectively quantify their differences and similarities; the Hamming metric does the same for the 64 codons of the codon set. (Hamming distances equal the number of different codon positions: AAA and AAC are at 1-distance; codons are maximally at 3-distance.) The CodonPolytope, a 9-dimensional geometric object, is spanned by 64 vertices that represent the codons and the Euclidian distances between these vertices correspond one-to-one with intercodon Hamming distances. The CodonGraph represents the vertices and edges of the polytope; each edge equals a Hamming 1-distance. The mirror reflection symmetry group of the polytope is isomorphic to the largest permutation symmetry group of the codon set that preserves Hamming distances. These groups contain 82,944 symmetries. Many polytope symmetries coincide with the degeneracy and similarity patterns of the genetic code. These code symmetries are strongly related with the face structure of the polytope with smaller faces displaying stronger code symmetries. Splitting the polytope stepwise into smaller faces models an early evolution of the code that generates this hierarchy of code symmetries. The canonical code represents a class of 41,472 codes with equivalent symmetries; a single class among an astronomical number of symmetry classes comprising all possible codes.
Highlights
The canonical genetic code as summarized by the codon table (Figure 1) consists of 64 codons or code words and each word encodes a single message—an amino acid or stop codon
Simplified amino acid alphabets consist of various sets of similar amino acids, such as a size-2 alphabet composed of a set of hydrophobic and a set of hydrophilic amino acids, and as will be discussed in Section 6, amino acids belonging to the same set are often grouped together in the codon table
As a first step we develop a geometric model of the code that faithfully maps Hamming distances onto Euclidian distances—the CodonPolytope (Section 4)
Summary
The canonical genetic code as summarized by the codon table (Figure 1) consists of 64 codons or code words and each word encodes a single message—an amino acid or stop codon. While the codon set with Hamming metric possesses permutation symmetries that preserve this metric, the polytope displays Euclidian symmetries and space coordinates that greatly facilitate the (computational) analysis of the similarity patterns of the code (Sections 5, 6 and 7). A 9-polytope differs significantly from the 6-cube and the other mathematical models for the genetic code referenced above, but the polytope does uniquely correspond with the graph representation of the codon set with Hamming metric, the CodonGraph (Section 2.3, [24]). The polytope symmetries partition the astronomically large number of all possible codes mapping 64 codons onto 21 messages into symmetry equivalence classes This classification quantifies the uniqueness of the genetic code and its symmetries (Section 7). These findings and applications of the CodonPolytope for the analysis of the genetic code are discussed (Section 8)
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have