Similarity/ Dissimilarity and Phylogenetic Analysis of Protein Sequences

Sanjay Sharma,Tazid Ali

doi:10.5530/ajbls.2022.11.6

Abstract

In this paper, we first arrange the twenty essential amino acids in descending order according to their degeneracy numbers and following the arrangement we denote each as twenty 2D component vectors confined only to the first quadrant. We illustrated the protein sequences as a curve in 2D space by linking together the vectors representing the amino acids in the protein sequence. The proposed representation is then tested on the ND6 (NADH dehydrogenase subunit 6) protein sequences taken from eight different species for analyzing their similarity using a mathematical descriptor called a similar factor and similar matrix. We have seen that our technique produces a better phylogeny that is quite compatible with previously published results on the same data set. The statistical analysis shows that our approach has better correlations with the multiple sequence alignments.

Full Text