Protein Secondary Structure Prediction based on CNN and Machine Learning Algorithms

Mt. Akhi Khatun,Md. Nasim Adnan,Sk. Shalauddin Kabir,Romana Rahman Ema,Md. Alam Hossain,Syed Md. Galib

doi:10.14569/ijacsa.2022.0131108

Mt. Akhi Khatun, Md. Nasim Adnan + Show 4 more

Open Access

https://doi.org/10.14569/ijacsa.2022.0131108

Copy DOI

Abstract

One of the most important topics in computational biology is protein secondary structure prediction. Primary, secondary, tertiary, and quaternary structure are the four levels of complexity that can be used to characterize the entire structure of a protein that are totally ordered by the amino acid sequences. The polypeptide backbone of a protein's local configuration is referred to as a secondary structure. In this paper, three prediction algorithms have been proposed which will predict the protein secondary structure based on machine learning. These prediction methods have been improved by the model structure of convolutional neural networks (CNN). The Rectified Linear Units (ReLU) has been used as the activation function. The 2D CNN has been trained with machine learning algorithms, including Support Vector Machine, Naive Bays and Random Forest. The SVM is used to correctly classify the unseen data. Naïve Bays (NB) and Random Forest (RF) are also applied to solve the prediction problems for not only classification problems but also regression problems. The 2D CNN, hybrid of 2D CNN -SVM, CNN-RF and CNN-NB have been proposed in this experiment. These different methods are implemented with the RS126, 25PDB and CB513 dataset. Further, all prediction Q3 accuracy is compared and improved with their datasets.

Full Text