Abstract

Hypothetical proteins are the proteins whose existence has been anticipated, but for which there are certain scarcities of experimental evidences about its structure, function or linkage to any known genes. Sequencing of several genomes has resulted in numerous predicted open reading frames to which structure or function(s) cannot be readily assigned and sometimes they can make up a significant portion of a genome. In this study, we designed a pipeline for the study and efficient functional annotation of short hypothetical proteins (only which were < 400 amino acids) comparing two case studies, using amino acid sequence informations retrieved from two different protein databases. The investigation and in-silico analysis of likely functional aspects of hypothetical proteins were performed employing various computational methods and tools based on sequence similarity, identification of targeting signals, presence of known protein domains, physicochemical characterization, etc. Our annotation pipeline was able to annotate 90 hypothetical proteins out of 100 compared to evolutionary genealogy of genes: Non-supervised Orthologous Groups (eggNOG) databases' annotation of 82 proteins, which is about 8% more compared to eggNOG for case study 1 and 78 hypothetical proteins out of 96 compared to eggNOG’s annotation of 58 proteins, which is about 20.83% more compared to eggNOG for case study 2. It was also seen that some hypothetical proteins had a high aliphatic index, indicating higher thermostability in extreme environments. From this study subcellular localization involving cytoplasmic proteins and membrane proteins were also predicted with higher accuracies than other proteins. Hypothetical proteins can provide an insight of different unknown structures and functions of proteins and can be an important area for further research.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call