Abstract

Protein secondary structure prediction (SSP) has a variety of applications; however, there has been relatively limited improvement in accuracy for years. With a vision of moving forward all related fields, we aimed to make a fundamental advance in SSP. There have been many admirable efforts made to improve the machine learning algorithm for SSP. This work thus took a step back by manipulating the input features. A secondary structure element-based position-specific scoring matrix (SSE-PSSM) is proposed, based on which a new set of machine learning features can be established. The feasibility of this new PSSM was evaluated by rigid independent tests with training and testing datasets sharing <25% sequence identities. In all experiments, the proposed PSSM outperformed the traditional amino acid PSSM. This new PSSM can be easily combined with the amino acid PSSM, and the improvement in accuracy was remarkable. Preliminary tests made by combining the SSE-PSSM and well-known SSP methods showed 2.0% and 5.2% average improvements in three- and eight-state SSP accuracies, respectively. If this PSSM can be integrated into state-of-the-art SSP methods, the overall accuracy of SSP may break the current restriction and eventually bring benefit to all research and applications where secondary structure prediction plays a vital role during development. To facilitate the application and integration of the SSE-PSSM with modern SSP methods, we have established a web server and standalone programs for generating SSE-PSSM available at http://10.life.nctu.edu.tw/SSE-PSSM.

Highlights

  • The secondary structure prediction of a protein means determining the secondary structural conformation for each residue of the protein merely based on the amino acid sequence

  • With a protein sequence similarity search program like the PSI-BLAST [53] or HHBlits [68] and a protein structure dataset, a set of aligned amino acid sequences can be transformed into aligned SSE sequences and construct the SSE-positionspecific scoring matrix (PSSM)

  • If the query protein is so novel that few homologs exist in the target dataset, the propensity matrix (PPM) generated from the aligned amino acid sequences may contain too many zeros to maintain the quality of the generated PSSM

Read more

Summary

Introduction

The secondary structure prediction of a protein means determining the secondary structural conformation for each residue of the protein merely based on the amino acid sequence. We believe that if the accuracy of SSP can be substantially improved, research and applications dependent on it will all be advanced. This work aims to make a fundamental improvement in SSP, hoping that, if the proposed algorithm can be adopted by state-of-the-art SSP methods, the general accuracy of SSP will reach a new level. Thanks to many recent works [1,2,3,4,5,6,7,8,9,10,11], much progress has been brought to the methodology and machine learning algorithms for SSP. This study focuses on developing a new set of features that can be utilized in all mature machine-learning-based SSP methods. The outcome of our efforts is a new positionspecific scoring matrix (PSSM) composed of secondary structural elements instead of amino acid codes

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call