Abstract

In recent years, deep learning (DL) techniques have been applied in the structural and functional analysis of proteins in bioinformatics, especially in 8-state (Q8) protein secondary structure prediction (PSSP). In this paper, we have explored the performance of various DL architectures for Q8 PSSP, by developing six DL architectures, using convolutional neural networks (CNNs), recurrent neural networks (RNNs), and combinations of them. These architectures are: CNN-SW (CNNs with sliding window); CNN-WP (CNNs with whole protein as input); LSTM+ (Long Short-Term Memory (LSTM) & Bidirectional LSTM (BLSTM)); GRU+ (Gated Recurrent Unit (GRU) & bidirectional GRU (BGRU)); CNN-BGRU (CNNs & BGRUs); and CNN-BLSTM (CNNs & BLSTMs). They include batch normalization, dropout, and fully-connected layers. We have used CB6133 and CB513 datasets for training and testing, respectively. The experiments showed that combining CNN with BLSTM or BGRU overcame overfitting, and achieved better Q8 accuracy, precision, recall and F-score. The experiments on CB513 showed that CNN-SW, CNN-BGRU, and CNN-BLSTM achieved Q8 accuracy comparable with some state-of-the-art models.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call