Abstract

Protein distance matrix is widely used in various protein sequence analyses, and mainly obtained by using pairwise sequence alignment scores or protein sequence homology, which fail to take into consideration of individual physical characteristics of protein sequences and amino acids, or a combination of these features. In this paper, a new method is therefore proposed for constructing protein distance matrix based on natural amino acid indices in combination with Discrete Fourier Transform (DFT). For the proposed method, protein distance matrices can be generated using any given set of amino acid indices, each one of which represents a unique biological feature of protein sequences. In this study, the results are based on the combination of 25 widely accepted amino acid indices, which produced the best results, according to the biological relationships between proteins. As a case study 26 Cluster of Differentiation 4 (CD4) protein sequences were used in order to construct a distance matrix based on the proposed method. The results show that the pairwise relationship between CD4 protein sequences remain the same in comparison with their pairwise percent identity. For another group of protein sequences the pairwise relationship between CD4 protein sequences dramatically changed with the proposed method in comparison to the pairwise percent identity. The proposed distance matrix has been shown to have a positive impact on these case studies and therefore is expected to be useful in several fields such as multiple protein sequence alignment and phylogenetic analysis, where an accurate distance matrix based on natural generalized protein properties plays an important role.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call