Abstract

Protein secondary structures have been identified as the links in the physical processes of primary sequences, typically random coils, folding into functional tertiary structures that enable proteins to involve a variety of biological events in life science. Therefore, an efficient protein secondary structure predictor is of importance especially when the structure of an amino acid sequence fragment is not solved by high-resolution experiments, such as X-ray crystallography, cryo-electron microscopy, and nuclear magnetic resonance spectroscopy, which are usually time consuming and expensive. In this paper, a reductive deep learning model MLPRNN has been proposed to predict either 3-state or 8-state protein secondary structures. The prediction accuracy by the MLPRNN on the publicly available benchmark CB513 data set is comparable with those by other state-of-the-art models. More importantly, taking into account the reductive architecture, MLPRNN could be a baseline for future developments.

Highlights

  • Proteins are biomacromolecules that function in various life processes, many of which have been found as drug targets of human diseases (Huang et al, 2016; Li et al, 2021)

  • The present MLPRNN could be improved with more input features such as the ones introduced by DNSS2 or a larger training dataset like the TR12148

  • It should be noted that MLPRNN and DNSS2 share the same method of mapping Q8 to Q3

Read more

Summary

Introduction

Proteins are biomacromolecules that function in various life processes, many of which have been found as drug targets of human diseases (Huang et al, 2016; Li et al, 2021). Released from the ribosomes, the chains fold spontaneously to produce functional three-dimensional structures or tertiary structures (Anfinsen et al, 1961), which are usually determined by experiments, including X-ray crystallography, cryo-electron microscopy, and nuclear magnetic resonance spectroscopy. These experiments are often time consuming and expensive, which to a large extent explains the gap between the number of protein structures (∼150,000) deposited in the Protein Data Bank (PDB) (Berman et al, 2002) and that of sequences (∼140,000,000) stored in the UniProtKB/TrEMBL database (The UniProt Consortium, 2017, 2018). The three-dimensional structure of a protein is determined most by its amino acid sequence (Baker and Sali, 2001), indicating the possibility of theoretical prediction of a protein structure from its amino acid sequence

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call