Combining Prediction of Protein Aggregation Propensities with Prediction of Other One-Dimensional Properties
Combining Prediction of Protein Aggregation Propensities with Prediction of Other One-Dimensional Properties
- # Prediction Of Secondary Structure
- # Prediction Of Protein Secondary Structure
- # Prediction Of Propensity
- # Aggregation Propensity
- # Protein Aggregation Propensities
- # Three-dimensional Structure Modeling
- # Solvent Accessible Area
- # Accurate Prediction Of Secondary Structure
- # Prediction Of Structure
- # Parkinson's Diseases
- Book Chapter
2
- 10.1007/978-1-4939-6406-2_5
- Oct 28, 2016
Accurate prediction of protein secondary structure and other one-dimensional structure features is essential for accurate sequence alignment, three-dimensional structure modeling, and function prediction. SPINE-X is a software package to predict secondary structure as well as accessible surface area and dihedral angles ϕ and ψ. For secondary structure SPINE-X achieves an accuracy of between 81 and 84 % depending on the dataset and choice of tests. The Pearson correlation coefficient for accessible surface area prediction is 0.75 and the mean absolute error from the ϕ and ψ dihedral angles are 20∘ and 33∘, respectively. The source code and a Linux executables for SPINE-X are available from Research and Information Systems at http://mamiris.com .
- Research Article
11
- 10.3389/fbioe.2022.901018
- Jul 22, 2022
- Frontiers in bioengineering and biotechnology
Prediction of the protein secondary structure is a key issue in protein science. Protein secondary structure prediction (PSSP) aims to construct a function that can map the amino acid sequence into the secondary structure so that the protein secondary structure can be obtained according to the amino acid sequence. Driven by deep learning, the prediction accuracy of the protein secondary structure has been greatly improved in recent years. To explore a new technique of PSSP, this study introduces the concept of an adversarial game into the prediction of the secondary structure, and a conditional generative adversarial network (GAN)-based prediction model is proposed. We introduce a new multiscale convolution module and an improved channel attention (ICA) module into the generator to generate the secondary structure, and then a discriminator is designed to conflict with the generator to learn the complicated features of proteins. Then, we propose a PSSP method based on the proposed multiscale convolution module and ICA module. The experimental results indicate that the conditional GAN-based protein secondary structure prediction (CGAN-PSSP) model is workable and worthy of further study because of the strong feature-learning ability of adversarial learning.
- Research Article
8
- 10.1007/s00500-022-06783-9
- Feb 12, 2022
- Soft Computing
Protein Secondary Structure (PSS) prediction emerges as a hot topic in the area of bioinformatics.PSS helps to predict the tertiary structure and helps to understand its structures, which in turn helps to design various drugs. The existing PSS prediction techniques are capable of achieving Q3 accuracy of nearly 80% and have no improvement till now. In this paper, we propose a novel technique that uses amino acid sequences alone as an input feature and the respected feature vector matrix is given through the deep learning model (DLM) for PSS prediction. We use OneHotEncoding and LSTM (Long Short Term Memory) technique to forecast PSS that helps to achieve more accuracy. The OneHotEncoder is used to extract the local contexts of amino-acid sequences, and LSTM captures the long-distance interdependencies among aminoacids. The overall implementation is carried in MATLAB 2020a. The performance of this model is evaluated in terms of precision, recall, F1-score, and by the percentage of accuracy of both Q3 and Q8 secondary structure predictions. The Q3 structure of the proposed scheme gained 86.54, 85.2 and 85.7%CullPDB, CASP10, and CASP11 and the accuracy of Q8 is 77.8, 72.5 and 74.9% on the benchmark datasets such as CullPDB, CASP10, and CASP11 respectively. Some of the advantages of the proposed scheme are minimize the computation time and achieves better accuracy when compared to the other baseline models in the prediction of PSS.
- Research Article
117
- 10.1186/1471-2105-8-201
- Jun 14, 2007
- BMC Bioinformatics
BackgroundStructural properties of proteins such as secondary structure and solvent accessibility contribute to three-dimensional structure prediction, not only in the ab initio case but also when homology information to known structures is available. Structural properties are also routinely used in protein analysis even when homology is available, largely because homology modelling is lower throughput than, say, secondary structure prediction. Nonetheless, predictors of secondary structure and solvent accessibility are virtually always ab initio.ResultsHere we develop high-throughput machine learning systems for the prediction of protein secondary structure and solvent accessibility that exploit homology to proteins of known structure, where available, in the form of simple structural frequency profiles extracted from sets of PDB templates. We compare these systems to their state-of-the-art ab initio counterparts, and with a number of baselines in which secondary structures and solvent accessibilities are extracted directly from the templates. We show that structural information from templates greatly improves secondary structure and solvent accessibility prediction quality, and that, on average, the systems significantly enrich the information contained in the templates. For sequence similarity exceeding 30%, secondary structure prediction quality is approximately 90%, close to its theoretical maximum, and 2-class solvent accessibility roughly 85%. Gains are robust with respect to template selection noise, and significant for marginal sequence similarity and for short alignments, supporting the claim that these improved predictions may prove beneficial beyond the case in which clear homology is available.ConclusionThe predictive system are publicly available at the address .
- Research Article
248
- 10.1002/jcc.21968
- Nov 2, 2011
- Journal of Computational Chemistry
Accurate prediction of protein secondary structure is essential for accurate sequence alignment, three-dimensional structure modeling, and function prediction. The accuracy of ab initio secondary structure prediction from sequence, however, has only increased from around 77 to 80% over the past decade. Here, we developed a multistep neural-network algorithm by coupling secondary structure prediction with prediction of solvent accessibility and backbone torsion angles in an iterative manner. Our method called SPINE X was applied to a dataset of 2640 proteins (25% sequence identity cutoff) previously built for the first version of SPINE and achieved a 82.0% accuracy based on 10-fold cross validation (Q(3)). Surpassing 81% accuracy by SPINE X is further confirmed by employing an independently built test dataset of 1833 protein chains, a recently built dataset of 1975 proteins and 117 CASP 9 targets (critical assessment of structure prediction techniques) with an accuracy of 81.3%, 82.3% and 81.8%, respectively. The prediction accuracy is further improved to 83.8% for the dataset of 2640 proteins if the DSSP assignment used above is replaced by a more consistent consensus secondary structure assignment method. Comparison to the popular PSIPRED and CASP-winning structure-prediction techniques is made. SPINE X predicts number of helices and sheets correctly for 21.0% of 1833 proteins, compared to 17.6% by PSIPRED. It further shows that SPINE X consistently makes more accurate prediction in helical residues (6%) without over prediction while PSIPRED makes more accurate prediction in coil residues (3-5%) and over predicts them by 7%. SPINE X Server and its training/test datasets are available at http://sparks.informatics.iupui.edu/
- Book Chapter
4
- 10.1007/978-3-642-04759-6_5
- Jan 1, 2009
Accurate protein secondary structure prediction from the amino acid sequence is essential for almost all theoretical and experimental studies on protein structure and function. After a brief discussion of application of data mining for optimization of crystallization conditions for target proteins we show that data mining of structural fragments of proteins from known structures in the protein data bank (PDB) significantly improves the accuracy of secondary structure predictions. The original method was proposed by us a few years ago and was termed fragment database mining (FDM) (Cheng H, Sen TZ, Kloczkowski A, Margaritis D, Jernigan RL (2005) Prediction of protein secondary structure by mining structural fragment database. Polymer 46:4314–4321). This method gives excellent accuracy for predictions if similar sequence fragments are available in our library of structural fragments, but is less successful if such fragments are absent in the fragments database. Recently we have improved secondary structure predictions further by combining FDM with classical GOR V (Kloczkowski A, Ting KL, Jernigan RL, Garnier J (2002a) Combining the GOR V algorithm with evolutionary information for protein secondary structure prediction from amino acid sequence. Proteins 49:154–66; Sen TZ, Jernigan RL, Garnier J, Kloczkowski A (2005) GOR V server for protein secondary structure prediction. Bioinformatics 21:2787–8) predictions to form a combined method, so-called consensus database mining (CDM) (Sen TZ, Cheng H, Kloczkowski A, Jernigan RL (2006) A Consensus Data Mining secondary structure prediction by combining GOR V and Fragment Database Mining. Protein Sci 15:2499–506). FDM mines the structural segments of PDB, and utilizes structural information from the matching sequence fragments for the prediction of protein secondary structures. By combining it with the GOR V secondary structure prediction method, which is based on information theory and Bayesian statistics, coupled with evolutionary information from multiple sequence alignments (MSA), our CDM method guarantees improved accuracies of prediction. Additionally, with the constant growth in the number of new protein structures and folds in the PDB, the accuracy of the CDM method is clearly expected to increase in future. We have developed a publicly available CDM server (Cheng H, Sen TZ, Jernigan RL, Kloczkowski A (2007) Consensus Data Mining (CDM) Protein Secondary Structure Prediction Server: combining GOR V and Fragment Database Mining (FDM). Bioinformatics 23:2628–30) (http://gor.bb.iastate.edu/cdm) for protein secondary structure prediction.
- Research Article
7
- 10.1371/journal.pone.0254555
- Jul 14, 2021
- PloS one
The secondary structure prediction (SSP) of proteins has long been an essential structural biology technique with various applications. Despite its vital role in many research and industrial fields, in recent years, as the accuracy of state-of-the-art secondary structure predictors approaches the theoretical upper limit, SSP has been considered no longer challenging or too challenging to make advances. With the belief that the substantial improvement of SSP will move forward many fields depending on it, we conducted this study, which focused on three issues that have not been noticed or thoroughly examined yet but may have affected the reliability of the evaluation of previous SSP algorithms. These issues are all about the sequence homology between or within the developmental and evaluation datasets. We thus designed many different homology layouts of datasets to train and evaluate SSP prediction models. Multiple repeats were performed in each experiment by random sampling. The conclusions obtained with small experimental datasets were verified with large-scale datasets using state-of-the-art SSP algorithms. Very different from the long-established assumption, we discover that the sequence homology between query datasets for training, testing, and independent tests exerts little influence on SSP accuracy. Besides, the sequence homology redundancy between or within most datasets would make the accuracy of an SSP algorithm overestimated, while the redundancy within the reference dataset for extracting predictive features would make the accuracy underestimated. Since the overestimating effects are more significant than the underestimating effect, the accuracy of some SSP methods might have been overestimated. Based on the discoveries, we propose a rigorous procedure for developing SSP algorithms and making reliable evaluations, hoping to bring substantial improvements to future SSP methods and benefit all research and application fields relying on accurate prediction of protein secondary structures.
- Conference Article
8
- 10.23919/indiacom54597.2022.9763114
- Mar 23, 2022
Protein secondary structure prediction is one of the hot research topics in computation biology. Accurate prediction of protein Secondary structures provide insights into drug discovery and design of enzyme. In addition, it plays an instrumental role in identifying structural-classes, protein-folds, and its three dimensional structure. However, the experimental determination of protein secondary structures is laborious and costly. It, therefore, hinges much on the use of computational techniques for prediction of secondary structures. In recent years, deep neural networks have been used extensively for protein secondary structure prediction. However, the deep learning models focusing on extracting local dependencies of a protein sequence face difficulties in effectively extracting non-local dependencies. Although LSTM recurrent neural network solved the problem of handling long range dependencies, these models suffer from vanishing gradients, exploding gradients and shallow layers. Moreover, these models fail to capture the dependencies that are very long. In this paper, we propose Attention augmented deep CNN-LSTM method to circumvent issues faced in LSTM RNNs. Our proposed model is able to efficiently capture both local and long range dependencies for enhancing the prediction of secondary structures. Experiments were conducted on CB6133, CB513, CASP10 and CASP11 benchmark datasets. The experimental results indicate that the performance of our method is better than the baseline methods.
- Research Article
6
- 10.1007/s12038-007-0093-1
- Aug 1, 2007
- Journal of Biosciences
Protein secondary structure predictions and amino acid long range contact map predictions from primary sequence of proteins have been explored to aid in modelling protein tertiary structures. In order to evaluate the usefulness of secondary structure and 3D-residue contact prediction methods to model protein structures we have used the known Q3 (alpha-helix,beta-strands and irregular turns/loops) secondary structure information, along with residue-residue contact information as restraints for MODELLER. We present here results of our modelling studies on 30 best resolved single domain protein structures of varied lengths. The results shows that it is very difficult to obtain useful models even with 100% accurate secondary structure predictions and accurate residue contact predictions for up to 30% of residues in a sequence. The best models that we obtained for proteins of lengths 37, 70, 118, 136 and 193 amino acid residues are of RMSDs 4.17, 5.27, 9.12, 7.89 and 9.69,respectively. The results show that one can obtain better models for the proteins which have high percent of alpha-helix content. This analysis further shows that MODELLER restrain optimization program can be useful only if we have truly homologous structure(s) as a template where it derives numerous restraints, almost identical to the templates used. This analysis also clearly indicates that even if we satisfy several true residue-residue contact distances, up to 30%of their sequence length with fully known secondary structural information, we end up predicting model structures much distant from their corresponding native structures.
- Research Article
3
- 10.1371/journal.pone.0254555.r004
- Jul 14, 2021
- PLoS ONE
The secondary structure prediction (SSP) of proteins has long been an essential structural biology technique with various applications. Despite its vital role in many research and industrial fields, in recent years, as the accuracy of state-of-the-art secondary structure predictors approaches the theoretical upper limit, SSP has been considered no longer challenging or too challenging to make advances. With the belief that the substantial improvement of SSP will move forward many fields depending on it, we conducted this study, which focused on three issues that have not been noticed or thoroughly examined yet but may have affected the reliability of the evaluation of previous SSP algorithms. These issues are all about the sequence homology between or within the developmental and evaluation datasets. We thus designed many different homology layouts of datasets to train and evaluate SSP prediction models. Multiple repeats were performed in each experiment by random sampling. The conclusions obtained with small experimental datasets were verified with large-scale datasets using state-of-the-art SSP algorithms. Very different from the long-established assumption, we discover that the sequence homology between query datasets for training, testing, and independent tests exerts little influence on SSP accuracy. Besides, the sequence homology redundancy between or within most datasets would make the accuracy of an SSP algorithm overestimated, while the redundancy within the reference dataset for extracting predictive features would make the accuracy underestimated. Since the overestimating effects are more significant than the underestimating effect, the accuracy of some SSP methods might have been overestimated. Based on the discoveries, we propose a rigorous procedure for developing SSP algorithms and making reliable evaluations, hoping to bring substantial improvements to future SSP methods and benefit all research and application fields relying on accurate prediction of protein secondary structures.
- Research Article
1
- 10.32604/cmc.2022.026408
- Jan 1, 2022
- Computers, Materials & Continua
The secondary structure of a protein is critical for establishing a link between the protein primary and tertiary structures. For this reason, it is important to design methods for accurate protein secondary structure prediction. Most of the existing computational techniques for protein structural and functional prediction are based on machine learning with shallow frameworks. Different deep learning architectures have already been applied to tackle protein secondary structure prediction problem. In this study, deep learning based models, i.e., convolutional neural network and long short-term memory for protein secondary structure prediction were proposed. The input to proposed models is amino acid sequences which were derived from CulledPDB dataset. Hyperparameter tuning with cross validation was employed to attain best parameters for the proposed models. The proposed models enables effective processing of amino acids and attain approximately 87.05% and 87.47% Q3 accuracy of protein secondary structure prediction for convolutional neural network and long short-term memory models, respectively.
- Research Article
5
- 10.1371/journal.pone.0255076.r004
- Jul 28, 2021
- PLoS ONE
Protein secondary structure prediction (SSP) has a variety of applications; however, there has been relatively limited improvement in accuracy for years. With a vision of moving forward all related fields, we aimed to make a fundamental advance in SSP. There have been many admirable efforts made to improve the machine learning algorithm for SSP. This work thus took a step back by manipulating the input features. A secondary structure element-based position-specific scoring matrix (SSE-PSSM) is proposed, based on which a new set of machine learning features can be established. The feasibility of this new PSSM was evaluated by rigid independent tests with training and testing datasets sharing <25% sequence identities. In all experiments, the proposed PSSM outperformed the traditional amino acid PSSM. This new PSSM can be easily combined with the amino acid PSSM, and the improvement in accuracy was remarkable. Preliminary tests made by combining the SSE-PSSM and well-known SSP methods showed 2.0% and 5.2% average improvements in three- and eight-state SSP accuracies, respectively. If this PSSM can be integrated into state-of-the-art SSP methods, the overall accuracy of SSP may break the current restriction and eventually bring benefit to all research and applications where secondary structure prediction plays a vital role during development. To facilitate the application and integration of the SSE-PSSM with modern SSP methods, we have established a web server and standalone programs for generating SSE-PSSM available at http://10.life.nctu.edu.tw/SSE-PSSM.
- Research Article
11
- 10.1371/journal.pone.0255076
- Jul 28, 2021
- PLOS ONE
Protein secondary structure prediction (SSP) has a variety of applications; however, there has been relatively limited improvement in accuracy for years. With a vision of moving forward all related fields, we aimed to make a fundamental advance in SSP. There have been many admirable efforts made to improve the machine learning algorithm for SSP. This work thus took a step back by manipulating the input features. A secondary structure element-based position-specific scoring matrix (SSE-PSSM) is proposed, based on which a new set of machine learning features can be established. The feasibility of this new PSSM was evaluated by rigid independent tests with training and testing datasets sharing <25% sequence identities. In all experiments, the proposed PSSM outperformed the traditional amino acid PSSM. This new PSSM can be easily combined with the amino acid PSSM, and the improvement in accuracy was remarkable. Preliminary tests made by combining the SSE-PSSM and well-known SSP methods showed 2.0% and 5.2% average improvements in three- and eight-state SSP accuracies, respectively. If this PSSM can be integrated into state-of-the-art SSP methods, the overall accuracy of SSP may break the current restriction and eventually bring benefit to all research and applications where secondary structure prediction plays a vital role during development. To facilitate the application and integration of the SSE-PSSM with modern SSP methods, we have established a web server and standalone programs for generating SSE-PSSM available at http://10.life.nctu.edu.tw/SSE-PSSM.
- Research Article
31
- 10.1093/bioinformatics/btr611
- Nov 7, 2011
- Bioinformatics
The precise prediction of protein secondary structure is of key importance for the prediction of 3D structure and biological function. Although the development of many excellent methods over the last few decades has allowed the achievement of prediction accuracies of up to 80%, progress seems to have reached a bottleneck, and further improvements in accuracy have proven difficult. We propose for the first time a structural position-specific scoring matrix (SPSSM), and establish an unprecedented database of 9 million sequences and their SPSSMs. This database, when combined with a purpose-designed BLAST tool, provides a novel prediction tool: SPSSMPred. When the SPSSMPred was validated on a large dataset (10,814 entries), the Q3 accuracy of the protein secondary structure prediction was 93.4%. Our approach was tested on the two latest EVA sets; accuracies of 82.7 and 82.0% were achieved, far higher than can be achieved using other predictors. For further evaluation, we tested our approach on newly determined sequences (141 entries), and obtained an accuracy of 89.6%. For a set of low-homology proteins (40 entries), the SPSSMPred still achieved a Q3 value of 84.6%. The SPSSMPred server is available at http://cal.tongji.edu.cn/SPSSMPred/ lith@tongji.edu.cn
- Abstract
- 10.1016/j.bpj.2016.11.1100
- Feb 1, 2017
- Biophysical Journal
Prediction of Protein Aggregation Propensities using GOR Method
- New
- Research Article
- 10.1016/j.bpj.2025.11.005
- Nov 7, 2025
- Biophysical journal
- New
- Research Article
- 10.1016/j.bpj.2025.11.006
- Nov 7, 2025
- Biophysical journal
- New
- Research Article
- 10.1016/j.bpj.2025.11.004
- Nov 7, 2025
- Biophysical journal
- New
- Research Article
- 10.1016/j.bpj.2025.11.002
- Nov 1, 2025
- Biophysical journal
- New
- Research Article
- 10.1016/j.bpj.2025.10.041
- Nov 1, 2025
- Biophysical journal
- New
- Research Article
- 10.1016/j.bpj.2025.10.031
- Nov 1, 2025
- Biophysical journal
- New
- Research Article
- 10.1016/j.bpj.2025.10.044
- Nov 1, 2025
- Biophysical journal
- New
- Research Article
- 10.1016/j.bpj.2025.11.001
- Nov 1, 2025
- Biophysical journal
- New
- Research Article
- 10.1016/j.bpj.2025.09.037
- Nov 1, 2025
- Biophysical journal
- New
- Research Article
- 10.1016/j.bpj.2025.10.040
- Nov 1, 2025
- Biophysical journal
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.