Abstract

Background: Protein sequence analysis helps in the prediction of protein functions. As the number of proteins increases, it gives the bioinformaticians a challenge to analyze and study the similarity between them. Most of the existing protein analysis methods use Support Vector Machine. Deep learning did not receive much attention regarding protein analysis as it is noted that little work focused on studying the protein diseases classification. Objective: The contribution of this paper is to present a deep learning approach that classifies protein diseases based on protein descriptors. Methods: Different protein descriptors are used and decomposed into modified feature descriptors. Uniquely, we introduce using the Convolutional Neural Network model to learn and classify protein diseases. The modified feature descriptors are fed to the Convolutional Neural Network model on a dataset of 1563 protein sequences classified into 3 different disease classes: AIDS, Tumor suppressor, and Proto-oncogene. Results: The usage of the modified feature descriptors shows a significant increase in the performance of the Convolutional Neural Network model over Support Vector Machine using different kernel functions. One modified feature descriptor improved by 19.8%, 27.9%, 17.6%, 21.5%, 17.3%, and 22% for evaluation metrics: Area Under the Curve, Matthews Correlation Coefficient, Accuracy, F1-score, Recall, and Precision, respectively. Conclusion: Results show that the prediction of the proposed CNN model trained by modified feature descriptors significantly surpasses that of Support Vector Machine model.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call