Abstract

Similarity/dissimilarity analysis is a key way of understanding the biology of an organism by knowing the origin of the new genes/sequences. Sequence data are grouped in terms of biological relationships. The number of sequences related to any group is susceptible to be increased every day. All the present alignment-free methods approve the utility of their approaches by producing a similarity/dissimilarity matrix. Although this matrix is clear, it measures the degree of similarity among sequences individually. In our work, a representative of each of three groups of protein sequences is introduced. A similarity/dissimilarity vector is evaluated instead of the ordinary similarity/dissimilarity matrix based on the group representative. The approach is applied on three selected groups of protein sequences: beta globin, NADH dehydrogenase subunit 5 (ND5), and spike protein sequences. A cross-grouping comparison is produced to ensure the singularity of each group. A qualitative comparison between our approach, previous articles, and the phylogenetic tree of these protein sequences proved the utility of our approach.

Highlights

  • Sequence comparison is used to study structural and functional conservation and evolutionary relations among the sequences

  • The similarity/dissimilarity vectors that are corresponding to beta globin, NADH dehydrogenase subunit 5 (ND5), and spike protein sequences are illustrated in Tables 9, 10, and 11, respectively, based on the two methods discussed before

  • A group representative vector is introduced to represent each group of protein sequences

Read more

Summary

Introduction

Sequence comparison is used to study structural and functional conservation and evolutionary relations among the sequences. Three groups of protein sequences are selected to illustrate our approach They are beta globin, NADH dehydrogenase subunit 5 (ND5), and spike protein sequences. They are selected as each group has sequences of similar range of lengths. The adjacency vector is introduced as a novel descriptor for protein sequences It is computed for each sequence in the selected sample of three groups. Our approach is independent of the protein sequence length It does not require any previous graphical representation. Their range of lengths is from 121 to 147.

29 AAQ01597
The Adjacency Vector
The Group Representative Vector
20 PC4205
Cross-Group Comparison
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call