Abstract

The comparison of protein sequences according to similarity is a fundamental aspect of today's biomedical research. With the developments of sequencing technologies, a large number of protein sequences increase exponentially in the public databases. Famous sequences' comparison methods are alignment based. They generally give excellent results when the sequences under study are closely related and they are time consuming. Herein, a new alignment-free method is introduced. Our technique depends on a new graphical representation and descriptor. The graphical representation of protein sequence is a simple way to visualize protein sequences. The descriptor compresses the primary sequence into a single vector composed of only two values. Our approach gives good results with both short and long sequences within a little computation time. It is applied on nine beta globin, nine ND5 (NADH dehydrogenase subunit 5), and 24 spike protein sequences. Correlation and significance analyses are also introduced to compare our similarity/dissimilarity results with others' approaches, results, and sequence homology.

Highlights

  • Information encoded in the genome of any organism plays a central role in defining the life of that organism

  • We introduce a new alignment-free method for protein sequences

  • Each amino acid in the protein sequence is represented by a number, and a new 2D graphical representation is suggested

Read more

Summary

Introduction

Information encoded in the genome of any organism plays a central role in defining the life of that organism. Each amino acid in the protein sequence is represented by a number, and a new 2D graphical representation is suggested. Mathematica 8 where all the results and figures are produced They are nine beta globin, nine ND5 (NADH dehydrogenase subunit 5), and 24 coronaviruses protein sequences as illustrated in Tables 1–3, respectively. These datasets are selected to be different in length. We apply our approach on nine beta globin and nine ND5 (NADH dehydrogenase subunit 5) protein sequences, which are illustrated in Tables 1 and 2. The 2D graphical representation for human, chimpanzee, and opossum beta globin protein sequences is illustrated in Figures 2(a)–2(c), respectively. The 2D graphical representation of TGEVG from class I and GD03T0013 from SARS_CoV protein sequences is illustrated in Figures 4(a) and 4(b) respectively

Protein Sequence Descriptor
The Phylogenetic Tree of the Protein
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call