Improving protein secondary structure prediction based on short subsequences with local structure similarity

Hsin-Nan Lin,Ting-Yi Sung,Wen-Lian Hsu,Shinn-Ying Ho

doi:10.1186/1471-2164-11-s4-s4

Abstract

BackgroundWhen characterizing the structural topology of proteins, protein secondary structure (PSS) plays an important role in analyzing and modeling protein structures because it represents the local conformation of amino acids into regular structures. Although PSS prediction has been studied for decades, the prediction accuracy reaches a bottleneck at around 80%, and further improvement is very difficult.ResultsIn this paper, we present an improved dictionary-based PSS prediction method called SymPred, and a meta-predictor called SymPsiPred. We adopt the concept behind natural language processing techniques and propose synonymous words to capture local sequence similarities in a group of similar proteins. A synonymous word is an n-gram pattern of amino acids that reflects the sequence variation in a protein’s evolution. We generate a protein-dependent synonymous dictionary from a set of protein sequences for PSS prediction.On a large non-redundant dataset of 8,297 protein chains (DsspNr-25), the average Q3 of SymPred and SymPsiPred are 81.0% and 83.9% respectively. On the two latest independent test sets (EVA Set_1 and EVA_Set2), the average Q3 of SymPred is 78.8% and 79.2% respectively. SymPred outperforms other existing methods by 1.4% to 5.4%. We study two factors that may affect the performance of SymPred and find that it is very sensitive to the number of proteins of both known and unknown structures. This finding implies that SymPred and SymPsiPred have the potential to achieve higher accuracy as the number of protein sequences in the NCBInr and PDB databases increases.ConclusionsOur experiment results show that local similarities in protein sequences typically exhibit conserved structures, which can be used to improve the accuracy of secondary structure prediction. For the application of synonymous words, we demonstrate an example of a sequence alignment which is generated by the distribution of shared synonymous words of a pair of protein sequences. We can align the two sequences nearly perfectly which are very dissimilar at the sequence level but very similar at the structural level. The SymPred and SymPsiPred prediction servers are available at http://bio-cluster.iis.sinica.edu.tw/SymPred/.

Highlights

When characterizing the structural topology of proteins, protein secondary structure (PSS) plays an important role in analyzing and modeling protein structures because it represents the local conformation of amino acids into regular structures
We used DsspNr-25 as the validation set to determine the parameters of SymPred by leave-one-out cross validation (LOOCV) since LOOCV has been shown to provide an almost unbiased estimate of the generalization error [37] and makes the most use the data. (SymPred does not need to rebuild model unlike most machine learning methods when using LOOCV.) Once the parameters of SymPred, including the length n of a word and the dictionary, were determined, we used the validation set DsspNr-25 to evaluate the performance of SymPred and SymPsiPred by 10-fold cross validation and LOOCV
SymPred’s performance improves between 0.5% and 2.8% each time the number of template proteins is increased by 10%.With more protein sequences in the template pool, the synonymous dictionary can learn more synonymous words from those sequences and their similar protein sequences

Summary

Introduction

When characterizing the structural topology of proteins, protein secondary structure (PSS) plays an important role in analyzing and modeling protein structures because it represents the local conformation of amino acids into regular structures. Protein secondary structure (PSS) plays an important role in analyzing and modeling protein structures because it represents the local conformation of amino acids into regular structures. Many researchers employ PSS as a feature to predict the tertiary structure [1,2,3,4], function [5,6,7,8], or subcellular localization [9,10,11] of proteins. Among the various features used to predict protein function, such as amino acid composition, disorder patterns, and signal peptides, PSS makes the largest contribution [12]. It has been suggested that secondary structure alone may be sufficient for accurate prediction of a protein’s tertiary structure [13]

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Genomics	Publication Date: Dec 1, 2010
Citations: 71	License type: cc-by

R Discovery Prime

R Discovery Prime

Improving protein secondary structure prediction based on short subsequences with local structure similarity

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics

Lead the way for us

Similar Papers

SPINE X: improving protein secondary structure prediction by multistep learning coupled with prediction of solvent accessible surface area and backbone torsion angles.
Eshel Faraggi ... Yuedong Yang
Journal of Computational Chemistry | VOL. 33
Eshel Faraggi, et. al.Eshel Faraggi ... Yuedong Yang
02 Nov 2011
Journal of Computational Chemistry | VOL. 33

OneHotEncoding and LSTM-based deep learning models for protein secondary structure prediction
Vamsidhar Enireddy ... C. Karthikeyan
Soft Computing | VOL. 26
Vamsidhar Enireddy, et. al.Vamsidhar Enireddy ... C. Karthikeyan
12 Feb 2022
Soft Computing | VOL. 26

Implementation of a hybrid Neuro Fuzzy Genetic System for improving protein secondary structure prediction
Andey Krishnaji ... Allam Appa Rao
-
Andey Krishnaji, et. al.Andey Krishnaji ... Allam Appa Rao
01 Nov 2012
01 Nov 2012

Data Mining for Protein Secondary Structure Prediction
Haitao Cheng ... Taner Z Sen
-
Haitao Cheng, et. al.Haitao Cheng ... Taner Z Sen
01 Jan 2009
01 Jan 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Improving protein secondary structure prediction based on short subsequences with local structure similarity

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics