Abstract

Protein three-dimensional (3D) structures provide insightful information in many fields of biology. One-dimensional properties derived from 3D structures such as secondary structure, residue solvent accessibility, residue depth and backbone torsion angles are helpful to protein function prediction, fold recognition and ab initio folding. Here, we predict various structural features with the assistance of neural network learning. Based on an independent test dataset, protein secondary structure prediction generates an overall Q3 accuracy of ~80%. Meanwhile, the prediction of relative solvent accessibility obtains the highest mean absolute error of 0.164, and prediction of residue depth achieves the lowest mean absolute error of 0.062. We further improve the outer membrane protein identification by including the predicted structural features in a scoring function using a simple profile-to-profile alignment. The results demonstrate that the accuracy of outer membrane protein identification can be improved by ~3% at a 1% false positive level when structural features are incorporated. Finally, our methods are available as two convenient and easy-to-use programs. One is PSSM-2-Features for predicting secondary structure, relative solvent accessibility, residue depth and backbone torsion angles, the other is PPA-OMP for identifying outer membrane proteins from proteomes.

Highlights

  • For structurally known proteins, these properties can be directly obtained from Protein Data Bank (PDB) coordinates using computational programs

  • The pipeline consists of three modules: (1) prediction of structural features, (2) identification of outer membrane protein (OMP), and (3) modeling of 3D structures for potential OMPs

  • A query sequence is threaded by PSI-BLAST through the NCBI NR database for three iterations with an e-value threshold of

Read more

Summary

Results

When protein secondary structure was assigned by STRIDE, the QH, QE and QC values of PSSM-2-Features are 0.869, 0.728 and 0.764, respectively. RSA was the most difficult to be predicted and we obtained a MAE value of 0.164 on SCOPe_TEST1073 dataset. This result may suggest that solvent accessibility is probably less conserved than other properties (e.g., secondary structure) in the protein families, which is consistent with that reported by ROST and Sander[24]. The distributions of the Pcc scores for the proteins on the SCOPe_TEST1073 dataset for RSA and RD are shown in Figs 3 and 4, respectively. The results by our Secondary structure assigned by STRIDE

Input featuresb
Materials and Methods
Author Contributions
Additional Information

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.