Abstract

Physiochemical properties of amino acids has significant role in the study of comparison of protein sequences. In the literature, an arbitrary and random combination of these properties has been considered for protein sequence comparison. In the present paper, comparison of protein sequences is obtained using only five known physical properties of the amino acids. Principal component analysis (PCA) is applied on the numerical values corresponding to these physical properties related to twenty amino acids to reduce their dimensions. As a result, corresponding to each amino acid 20 TP values are obtained. Protein Sequences are represented based on these 20 TP values. Then cumulative sums on these represented sequences are taken to get the non-degenerate representations of each of the protein sequences. Now a new form of descriptor is obtained using generalized form of three moment vectors consisting of first, second and third order moments. Then distance matrices are obtained by using Euclidean distance as the distance measure. Finally phylogenetic tree based on such distance matrices using the UPGMA algorithm are constructed. The proposed method is applied on 9 ND4, 9 ND6, 16 ND5, 12 Baculovirus and also on 24 TF protein sequences. The result obtained by this new method is at par with the biological reference and also comparable with the results obtained earlier on the same species by other methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call