Abstract

Although genes carry information, proteins are the main role player in providing all the functionalities of a living organism. Massive amounts of different proteins involve in every function that occurs in a cell. These amino acid sequences can be hierarchically classified into a set of families and subfamilies depending on their evolutionary relatedness and similarities in their structure or function. Protein characterization to identify protein structure and function is done accurately using laboratory experiments. With the rapidly increasing huge amount of novel protein sequences, these experiments have become difficult to carry out since they are expensive, time-consuming, and laborious. Therefore, many computational classification methods are introduced to classify proteins and predict their functional properties. With the progress of the performance of the computational techniques, deep learning plays a key role in many areas. Novel deep learning models such as DeepFam, ProtCNN have been presented to classify proteins into their families recently. However, these deep learning models have been used to carry out the non-hierarchical classification of proteins. In this research, we propose a deep learning neural network model named DeepHiFam with high accuracy to classify proteins hierarchically into different levels simultaneously. The model achieved an accuracy of 98.38% for protein family classification and more than 80% accuracy for the classification of protein subfamilies and sub-subfamilies. Further, DeepHiFam performed well in the non-hierarchical classification of protein families and achieved an accuracy of 98.62% and 96.14% for the popular Pfam dataset and COG dataset respectively.

Highlights

  • Proteins are the main functional body of living organisms

  • According to the main functional roles played by proteins, they can be categorized into different groups such as structural, contractile, transport, enzyme, storage, hormonal, and protection [1]

  • We propose a model for the hierarchical classification of proteins at multiple levels based on DeepFam [6] with lesser complexity, computational cost, and training time

Read more

Summary

Introduction

Proteins are the main functional body of living organisms. Inside cells, there is a large number of proteins involved in different unique functionalities such as growth and maintenance, causing biochemical reactions, acting as a messenger, providing structures and protection. The model was tested using cross-fold validation and the experiments were conducted on two large-scale datasets of the SwissProt database [24] In another recent work, A deep CNN [25] has been trained to classify 521,527 sequences from the Uniprot database with 698 families of class labels showing AUC accuracy of 99.99% [26]. A deep CNN [25] has been trained to classify 521,527 sequences from the Uniprot database with 698 families of class labels showing AUC accuracy of 99.99% [26] It consists of 6 convolutional layers with 2 fully connected layers with nearly 1 million total network parameters. DeepFam model classifies protein sequences to in multiple levels: families, subfamilies, sub-subfamilies in separate rounds and authors have emphasized the importance of hierarchal classification or multi-task algorithm in deep learning.

Methodology
Categorical Cross Entropy
Evaluation metrics
À score
Method The Proposed Model
Conclusion
Method
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call