Abstract
The amino acid sequence of a protein contains all the necessary information to specify its shape, which dictates its biological activities. However, it is challenging and expensive to experimentally determine the three-dimensional structure of proteins. The backbone torsion angles play a critical role in protein structure prediction, and accurately predicting the angles can considerably advance the tertiary structure prediction by accelerating efficient sampling of the large conformational space for low energy structures. Here we first time propose evolutionary signatures computed from protein sequence profiles, and a novel recurrent architecture, termed ESIDEN, that adopts a straightforward architecture of recurrent neural networks with a small number of learnable parameters. The ESIDEN can capture efficient information from both the classic and new features benefiting from different recurrent architectures in processing information. On the other hand, compared to widely used classic features, the new features, especially the Ramachandran basin potential, provide statistical and evolutionary information to improve prediction accuracy. On four widely used benchmark datasets, the ESIDEN significantly improves the accuracy in predicting the torsion angles by comparison to the best-so-far methods. As demonstrated in the present study, the predicted angles can be used as structural constraints to accurately infer protein tertiary structures. Moreover, the proposed features would pave the way to improve machine learning-based methods in protein folding and structure prediction, as well as function prediction. The source code and data are available at the website https://kornmann.bioch.ox.ac.uk/leri/resources/download.html.
Highlights
The amino acid sequence of a protein contains all the necessary information to specify its shape, which dictates its biological activities
The developed ESIDEN network is implemented in PyTorch v1.7.047, and it is trained on high-performance computational clusters using one NVIDIA GTX2080Ti Graphics Processing Unit (GPU)
We demonstrate that the four novel features (RE, degree of conservation (DC), position-specific substitution probabilities (PSSP), and Ramachandran basin potential (RBP)) can improve the accuracy in predicting the torsion angles when they are combined with the basic features, and the performance of the ESIDEN is accessed by the combinations of different features on the same dataset
Summary
The amino acid sequence of a protein contains all the necessary information to specify its shape, which dictates its biological activities. The backbone torsion angles play a critical role in protein structure prediction, and accurately predicting the angles can considerably advance the tertiary structure prediction by accelerating efficient sampling of the large conformational space for low energy structures. The proposed features would pave the way to improve machine learning-based methods in protein folding and structure prediction, as well as function prediction. Predicting the torsion angles can considerably advance the tertiary structure prediction by accelerating efficient sampling of the large conformational space for the low-energy structures. Owing to the larger protein databases and the development of computing resources, as well as advances in machine learning methods and deep neural networks, the accuracy of protein backbone torsion angle prediction has been improved increasingly. The same network in the OPUS-TASS was trained for six different tasks including secondary structure, backbone torsion angles (TA), discrete descriptors of local backbone structure, solvent accessible surface area, and side-chain dihedral angles
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have