Abstract
The article is devoted to a matrix method of comparative analysis of long nucleotide sequences by means of presenting each sequence in the form of three digital binary sequences. This method uses a set of symmetries of biochemical attributes of nucleotides. It also uses the possibility of presentation of every whole set of N-mers as one of the members of a Kronecker family of genetic matrices. With this method, a long nucleotide sequence can be visually represented as an individual fractal-like mosaic or another regular mosaic of binary type. In contrast to natural nucleotide sequences, artificial random sequences give non-regular patterns. Examples of binary mosaics of long nucleotide sequences are shown, including cases of human chromosomes and penicillins. The obtained results are then discussed.
Highlights
Long nucleotide sequences are studied by many authors because of their importance for bioinformatics and theoretical biology
These mathematical notions are well known in the theory of digital signal processing, but are relatively new in bioinformatics, where they help in developing algebraic biology [24,25,26,27,28,29,30], a wide modern branch of theoretical and mathematical biology
One should emphasize that our method introduces an important notion in the field of molecular genetics and bioinformatics: binary fractals
Summary
Long nucleotide sequences are studied by many authors because of their importance for bioinformatics and theoretical biology. In genetic matrices of the Kronecker family (see Figure 1), each row has its individual binary number, which is connected with the fact that all N-plets inside this row have identical binary representation from the point of view of the first sub-alphabets on Figure 2. In genetic matrices of the Kronecker family (see Figure 1), each column has its individual binary number, which is connected with the fact that all N-plets inside this column have identical binary representation from the point of view of the second sub-alphabet on Figure 2. In the (8 × 8)-matrix [A G; C T](3) on Figure 1, the third column has its binary numeration 010 because each of its triplets (AGA, AGC, ATA, ATC, CGA, CGC, CTA, CTC) is a “amino–keto–amino” sequence that corresponds to binary number 010 from the point of view of the second sub-alphabet on Figure 2. Such gradual transition is described by means of a series of Kronecker multiplication of matrices
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have