Abstract

One novel representation of DNA sequence combining the global and local position information of the original sequence has been proposed to distinguish the different species. First, for the sufficient exploitation of global information, one graphical representation of DNA sequence has been formulated according to the curve of Fermat spiral. Then, for the consideration of local characteristics of DNA sequence, attaching each point in the curve of Fermat spiral with the related mass has been applied based on the relationships of neighboring four nucleotides. In this paper, the normalized moments of inertia of the curve of Fermat spiral which composed by the points with mass has been calculated as the numerical description of the corresponding DNA sequence on the first exons of beta-global genes. Choosing the Euclidean distance as the measurement of the numerical descriptions, the similarity between species has shown the performance of proposed method.

Highlights

  • The graphical and numerical representation of DNA, RNA or protein sequences has become the popular strategies to analyze the evolutionary relationship between species

  • We present one novel representation of DNA sequence based on global and local position information

  • The novel concept of representation of DNA sequence involves (1) formulating the graphical representation of DNA sequence according to the curve of Fermat spiral which remaining the global position information of the original sequence, (2) taking the local position information of DNA sequence into consideration according to attach each point in the curve of Fermat spiral with the related mass, (3) the normalized moments of inertia of the curve of Fermat spiral which composed by the points with mass has been calculated as the description of the corresponding DNA sequence on the first exons of beta-global genes

Read more

Summary

Graphical representation of DNA sequence

In order to make full use of global information of DNA sequence, the original DNA sequence is divided into four subsequences constituted by A, C, G or T that four point sets correspondingly can be obtained by the position of nucleotide in the original DNA sequence. Where NBS, NAS, NCS, NGS and NTS respectively denote the length of nucleotide in base, A, C, G and T subsequence. For the purpose of plotting the base curve of Fermat spiral corresponding to the base sequence, the coordinate of points in the polar coordinate system are calculated according to the information of position in the base sequence. Where θ ViBS denotes the polar angle of nucleotide ViBS in the polar coordinate system; L is one constant which means the shortest length of DNA sequence nucleotide ViBS in the base sequence which for different species in the experience; LViBS denotes ranging from 1 to NBS. As for the nucleotides in the base sequence, the corresponding set of coordinate for each point in the polar coordinates are calculated as. { ( ) ( ) ( ) ( )} pV1BS θV1BS, ρV1BS , pV2BS θV2BS, ρV2BS , ..., p ViBS θ ViBS, ρ ViBS

Attaching each point with a mass
Numerical Representation
Methods
Results and Discussion
Author Contributions
Additional Information

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.