Abstract

The article is devoted to a matrix method of comparative analysis of long nucleotide sequences by means of presenting each sequence in the form of three digital binary sequences. This method uses a set of symmetries of biochemical attributes of nucleotides. It also uses the possibility of presentation of every whole set of N-mers as one of the members of a Kronecker family of genetic matrices. With this method, a long nucleotide sequence can be visually represented as an individual fractal-like mosaic or another regular mosaic of binary type. In contrast to natural nucleotide sequences, artificial random sequences give non-regular patterns. Examples of binary mosaics of long nucleotide sequences are shown, including cases of human chromosomes and penicillins. The obtained results are then discussed.

Highlights

  • Long nucleotide sequences are studied by many authors because of their importance for bioinformatics and theoretical biology

  • These mathematical notions are well known in the theory of digital signal processing, but are relatively new in bioinformatics, where they help in developing algebraic biology [24,25,26,27,28,29,30], a wide modern branch of theoretical and mathematical biology

  • One should emphasize that our method introduces an important notion in the field of molecular genetics and bioinformatics: binary fractals

Read more

Summary

Introduction

Long nucleotide sequences are studied by many authors because of their importance for bioinformatics and theoretical biology. In genetic matrices of the Kronecker family (see Figure 1), each row has its individual binary number, which is connected with the fact that all N-plets inside this row have identical binary representation from the point of view of the first sub-alphabets on Figure 2. In genetic matrices of the Kronecker family (see Figure 1), each column has its individual binary number, which is connected with the fact that all N-plets inside this column have identical binary representation from the point of view of the second sub-alphabet on Figure 2. In the (8 × 8)-matrix [A G; C T](3) on Figure 1, the third column has its binary numeration 010 because each of its triplets (AGA, AGC, ATA, ATC, CGA, CGC, CTA, CTC) is a “amino–keto–amino” sequence that corresponds to binary number 010 from the point of view of the second sub-alphabet on Figure 2. Such gradual transition is described by means of a series of Kronecker multiplication of matrices

The Description of the Matrix Method for Long Nucleotide Sequences
19. Riemerella
Figure
Examples ofof patterns
Long Random Sequences
Patterns of Human Chromosomes
Patterns of Penicillin
12. Examples
About 3D-Representations
13. Upper level: twotwo
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call