Abstract

A new method for similarity analysis of protein sequences is presented in this paper. On the basis of positions, proportion difference and various physicochemical properties of 20 kinds of amino acid in different protein sequences, representative information was extracted from protein sequence and converted into a numeric vector, thus further similarities of protein sequences were analyzed by studying the similarities between vectors. To facilitate the comparison between protein sequences of different length, every protein sequence is first mapped to a fixed-length vector, of which the vector information is relative position of amino acids. Then percentage of 20 kinds of amino acids in the sequence and 3 physicochemical properties are combined to constitute physicochemical information vector. Finally, a one-dimensiona l feature vector with 80 elements(feat ure vector) representing a protein sequence is synthesized. The shortest distance method was applied for cluster analysis on feature vectors so as to analyze similarities in protein sequences. In the numerical experiment part of the article, similarity analysis was conducted for 9 different species of the mitochondrial NADH dehydrogenase. The result of numerical experiment is consistent with the biological fact, which validates the effectiveness of model to a certain extent.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call