Abstract

Protein secondary structure prediction is an important topic in bioinformatics. This paper proposed a novel model named WS-BiLSTM, which combined the wavelet scattering convolutional network and the long-short-term memory network for the first time to predict protein secondary structure. This model captures nonlocal interactions between amino acid sequences and remembers long-range interactions between amino acids. In our WS-BiLSTM model, the wavelet scattering convolutional network is used to extract protein features from the PSSM sliding window; the extracted features are combined with the original PSSM data as the input features of the long-short-term memory network to predict protein secondary structure. It is worth noting that the wavelet scattering convolutional network is asymmetric as a member of the continuous wavelet family. The Q3 accuracy on the test set CASP9, CASP10, CASP11, CASP12, CB513, and PDB25 reached 85.26%, 85.84%, 84.91%, 85.13%, 86.10%, and 85.52%, which were higher 2.15%, 2.16%, 3.5%, 3.19%, 4.22%, and 2.75%, respectively, than using the long-short-term memory network alone. Comparing our results with the state-of-art methods shows that our proposed model achieved better results on the CB513 and CASP12 data sets. The experimental results show that the features extracted from the wavelet scattering convolutional network can effectively improve the accuracy of protein secondary structure prediction.

Highlights

  • Protein is an essential component of organisms, complete immunity, cellular signal transmission, and other functions

  • This paper proposes a protein secondary structure prediction method based on the wavelet scattering convolutional network and long-short-term memory network

  • In order to evaluate the accuracy of the proposed model and verify the effectiveness of the wavelet scattering convolutional network for feature extraction, two separate experiments were set up to predict the protein secondary structure

Read more

Summary

Introduction

Protein is an essential component of organisms, complete immunity, cellular signal transmission, and other functions. Protein structure can be divided into primary, secondary, tertiary, and quaternary structures. Inspired by the great success in the fields of computer vision [1], speech recognition [2], and emotion classification [3], the method based on deep learning has been widely used in many biological research fields [4,5]. Examples include protein contact map [6], drugtarget binding affinity [7,8], chromatin accessibility [9], protein function [10,11], and using Support Vector Machine (SVM) to solve the problem of protein structure prediction [12]. The main advantage of the deep learning method is that it can automatically represent the original sequence and learn hidden patterns through nonlinear transformation [13]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call