Prediction of protein secondary structure based on an improved channel attention and multiscale convolution module.
Prediction of the protein secondary structure is a key issue in protein science. Protein secondary structure prediction (PSSP) aims to construct a function that can map the amino acid sequence into the secondary structure so that the protein secondary structure can be obtained according to the amino acid sequence. Driven by deep learning, the prediction accuracy of the protein secondary structure has been greatly improved in recent years. To explore a new technique of PSSP, this study introduces the concept of an adversarial game into the prediction of the secondary structure, and a conditional generative adversarial network (GAN)-based prediction model is proposed. We introduce a new multiscale convolution module and an improved channel attention (ICA) module into the generator to generate the secondary structure, and then a discriminator is designed to conflict with the generator to learn the complicated features of proteins. Then, we propose a PSSP method based on the proposed multiscale convolution module and ICA module. The experimental results indicate that the conditional GAN-based protein secondary structure prediction (CGAN-PSSP) model is workable and worthy of further study because of the strong feature-learning ability of adversarial learning.
- # Prediction Of Secondary Structure
- # Protein Secondary Structure Prediction
- # Multiscale Convolution Module
- # Protein Secondary Structure
- # Channel Attention Module
- # Conditional Generative Adversarial Network
- # Secondary Structure
- # Prediction Of Structure
- # Prediction Accuracy Of Secondary Structure
- # Generative Adversarial Network
139
- 10.1073/pnas.0703700104
- Jul 17, 2007
- Proceedings of the National Academy of Sciences
144
- 10.1093/nar/gkm937
- Dec 11, 2007
- Nucleic Acids Research
2787
- 10.1038/nsb1203-980
- Dec 1, 2003
- Nature Structural & Molecular Biology
341
- 10.1038/srep11476
- Jun 22, 2015
- Scientific Reports
35
- 10.1016/j.jvcir.2020.102844
- Jun 20, 2020
- Journal of Visual Communication and Image Representation
268
- 10.48550/arxiv.1411.1784
- Nov 6, 2014
1689
- 10.1093/bioinformatics/btg224
- Aug 12, 2003
- Bioinformatics
5000
- 10.1016/0022-2836(78)90297-8
- Mar 1, 1978
- Journal of Molecular Biology
147
- 10.1002/prot.10181
- Aug 26, 2002
- Proteins: Structure, Function, and Bioinformatics
5429
- 10.1006/jmbi.1999.3091
- Sep 1, 1999
- Journal of Molecular Biology
- Research Article
4
- 10.1080/07391102.2024.2314264
- Feb 3, 2024
- Journal of Biomolecular Structure and Dynamics
Talaromyces marneffei (formerly Penicillium marneffei) is an endemic pathogenic fungus in Southern China and Southeast Asia. It can cause disease in patients with travel-related exposure to this organism and high morbidity and mortality in acquired immune deficiency syndrome (AIDS). In this study, we analyzed the structure and function of a hypothetical protein from T. marneffei using several bioinformatics tools and servers to unveil novel pharmacological targets and design a peptide vaccine against specific epitopes. A total of seven functional epitopes were screened on the protein, and ‘STGVDMWSV’ was the most antigenic, non-allergenic and non-toxic. Molecular docking showed stronger affinity between the CTL epitope ‘STGVDMWSV’ and the MHC I allele HLA-A*02:01, a higher docking score −234.98 kcal/mol, revealed stable interactions during a 100 ns molecular dynamic simulation. Overall, the results of this study revealed that this hypothetical protein is crucial for comprehending biochemical, physiological pathways and identifying novel therapeutic targets for human health.
- Research Article
- 10.1016/j.compbiomed.2025.110457
- Sep 1, 2025
- Computers in biology and medicine
DCBLSTM-Deep Convolutional Bidirectional Long Short-Term Memory neural network for Q8 secondary protein structure prediction.
- Research Article
1
- 10.18502/ijpa.v18i3.13753
- Oct 4, 2023
- Iranian Journal of Parasitology
We aimed to design a B and T cell recombinant protein vaccine of Toxoplasma gondii with in silico approach. MIC13 plays an important role in spreading the parasite in the host body. GRA1 causes the persistence of the parasite in the parasitophorous vacuole. SAG1 plays a role in host-cell adhesion and cell invasion. Amino acid positions 73-272 from MIC13, 71-190 from GRA1, and 101-300 from SAG1 were selected and joined with linker A(EAAAK)A. The structures, antigenicity, allergenicity, physicochemical properties, as well as codon optimization and mRNA structure of this recombinant protein called MGS1, were predicted using bioinformatics servers. The designed structure was synthesized and then cloned in pET28a (+) plasmid and transformed into Escherichia coli BL21. The number of amino acids in this antigen was 555, and its antigenicity was estimated to be 0.6340. SDS-PAGE and Western blotting confirmed gene expression and successful production of the protein with a molecular weight of 59.56kDa. This protein will be used in our future studies as an anti-Toxoplasma vaccine candidate in animal models. In silico methods are efficient for understanding information about proteins, selecting immunogenic epitopes, and finally producing recombinant proteins, as well as reducing the time and cost of vaccine design.
- Research Article
3
- 10.1007/s00521-024-09822-8
- May 13, 2024
- Neural Computing and Applications
An improved multi-scale convolutional neural network with gated recurrent neural network model for protein secondary structure prediction
- Research Article
3
- 10.1007/978-1-0716-4213-9_1
- Nov 14, 2024
- Methods in molecular biology (Clifton, N.J.)
The secondary structures (SSs) and supersecondary structures (SSSs) underlie the three-dimensional structure of proteins. Prediction of the SSs and SSSs from protein sequences enjoys high levels of use and finds numerous applications in the development of a broad range of other bioinformatics tools. Numerous sequence-based predictors of SS and SSS were developed and published in recent years. We survey and analyze 45 SS predictors that were released since 2018, focusing on their inputs, predictive models, scope of their prediction, and availability. We also review 32 sequence-based SSS predictors, which primarily focus on predicting coiled coils and beta-hairpins and which include five methods that were published since 2018. Substantial majority of these predictive tools rely on machine learning models, including a variety of deep neural network architectures. They also frequently use evolutionary sequence profiles. We discuss details of several modern SS and SSS predictors that are currently available to the users and which were published in higher impact venues.
- Research Article
- 10.1038/s41598-025-17513-0
- Aug 31, 2025
- Scientific Reports
The secondary structure of a protein serves as the foundation for constructing its three-dimensional (3D) structure, which in turn is critical for determining its function and role in biological processes. Therefore, accurately predicting secondary structure not only facilitates the understanding of a protein’s 3D conformation but also provides essential insights into its interactions, functional mechanisms, and potential applications in biomedical research. Deep learning models are particularly effective in protein secondary structure prediction because of their ability to process complex sequence data and extract meaningful patterns, thereby increasing prediction accuracy and efficiency. This study proposes a combined model, ITBM-KD, which integrates an improved temporal convolutional network (TCN), bidirectional recurrent neural network (BiRNN), and multilayer perceptron (MLP) to increase the accuracy of protein secondary structure prediction for octapeptides and tripeptides. By combining one-hot encoding, word vector representation of physicochemical properties, and knowledge distillation with the ProtT5 model, the proposed model achieves excellent performance on multiple datasets. To evaluate its effectiveness, two classic datasets, TS115 and CB513, containing 115 and 513 protein datasets, respectively, were used. In addition, 15,078 protein data points collected from the PDB database from June 6, 2018, to June 6, 2020, were used to further verify the robustness and generalizability of the model. This study improves prediction accuracy and provides an essential model for understanding protein structure and function, especially in resource-limited settings.
- Book Chapter
2
- 10.1007/978-981-99-9621-6_22
- Jan 1, 2024
AI-Assisted Methods for Protein Structure Prediction and Analysis
- Research Article
1
- 10.1109/embc40787.2023.10340202
- Jul 24, 2023
- Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference
Tissue-mimicking dielectric phantoms are widely used to mimic the relative permittivity and conductivity of human tissues in various medical applications. The artificial material combinations determine the characterization of dialectic phantoms. However, a method that reliably determined the composition of artificial materials with designed values of dielectric properties and frequency is still lacking. In this work, we propose a method that easily determine the compositions of phantom to mimic the human tissues from 16 MHz to 3 GHz.
- Research Article
4
- 10.1038/s41598-024-67403-0
- Jul 17, 2024
- Scientific Reports
Secondary structure prediction is a key step in understanding protein function and biological properties and is highly important in the fields of new drug development, disease treatment, bioengineering, etc. Accurately predicting the secondary structure of proteins helps to reveal how proteins are folded and how they function in cells. The application of deep learning models in protein structure prediction is particularly important because of their ability to process complex sequence information and extract meaningful patterns and features, thus significantly improving the accuracy and efficiency of prediction. In this study, a combined model integrating an improved temporal convolutional network (TCN), bidirectional long short-term memory (BiLSTM), and a multi-head attention (MHA) mechanism is proposed to enhance the accuracy of protein prediction in both eight-state and three-state structures. One-hot encoding features and word vector representations of physicochemical properties are incorporated. A significant emphasis is placed on knowledge distillation techniques utilizing the ProtT5 pretrained model, leading to performance improvements. The improved TCN, achieved through multiscale fusion and bidirectional operations, allows for better extraction of amino acid sequence features than traditional TCN models. The model demonstrated excellent prediction performance on multiple datasets. For the TS115, CB513 and PDB (2018–2020) datasets, the prediction accuracy of the eight-state structure of the six datasets in this paper reached 88.2%, 84.9%, and 95.3%, respectively, and the prediction accuracy of the three-state structure reached 91.3%, 90.3%, and 96.8%, respectively. This study not only improves the accuracy of protein secondary structure prediction but also provides an important tool for understanding protein structure and function, which is particularly applicable to resource-constrained contexts and provides a valuable tool for understanding protein structure and function.
- Book Chapter
5
- 10.1016/b978-0-443-22299-3.00014-1
- Jan 1, 2024
- Deep Learning Applications in Translational Bioinformatics
Chapter 14 - Generative adversarial networks in protein and ligand structure generation: a case study
- Research Article
8
- 10.1007/s00500-022-06783-9
- Feb 12, 2022
- Soft Computing
Protein Secondary Structure (PSS) prediction emerges as a hot topic in the area of bioinformatics.PSS helps to predict the tertiary structure and helps to understand its structures, which in turn helps to design various drugs. The existing PSS prediction techniques are capable of achieving Q3 accuracy of nearly 80% and have no improvement till now. In this paper, we propose a novel technique that uses amino acid sequences alone as an input feature and the respected feature vector matrix is given through the deep learning model (DLM) for PSS prediction. We use OneHotEncoding and LSTM (Long Short Term Memory) technique to forecast PSS that helps to achieve more accuracy. The OneHotEncoder is used to extract the local contexts of amino-acid sequences, and LSTM captures the long-distance interdependencies among aminoacids. The overall implementation is carried in MATLAB 2020a. The performance of this model is evaluated in terms of precision, recall, F1-score, and by the percentage of accuracy of both Q3 and Q8 secondary structure predictions. The Q3 structure of the proposed scheme gained 86.54, 85.2 and 85.7%CullPDB, CASP10, and CASP11 and the accuracy of Q8 is 77.8, 72.5 and 74.9% on the benchmark datasets such as CullPDB, CASP10, and CASP11 respectively. Some of the advantages of the proposed scheme are minimize the computation time and achieves better accuracy when compared to the other baseline models in the prediction of PSS.
- Book Chapter
4
- 10.1007/978-3-642-04759-6_5
- Jan 1, 2009
Accurate protein secondary structure prediction from the amino acid sequence is essential for almost all theoretical and experimental studies on protein structure and function. After a brief discussion of application of data mining for optimization of crystallization conditions for target proteins we show that data mining of structural fragments of proteins from known structures in the protein data bank (PDB) significantly improves the accuracy of secondary structure predictions. The original method was proposed by us a few years ago and was termed fragment database mining (FDM) (Cheng H, Sen TZ, Kloczkowski A, Margaritis D, Jernigan RL (2005) Prediction of protein secondary structure by mining structural fragment database. Polymer 46:4314–4321). This method gives excellent accuracy for predictions if similar sequence fragments are available in our library of structural fragments, but is less successful if such fragments are absent in the fragments database. Recently we have improved secondary structure predictions further by combining FDM with classical GOR V (Kloczkowski A, Ting KL, Jernigan RL, Garnier J (2002a) Combining the GOR V algorithm with evolutionary information for protein secondary structure prediction from amino acid sequence. Proteins 49:154–66; Sen TZ, Jernigan RL, Garnier J, Kloczkowski A (2005) GOR V server for protein secondary structure prediction. Bioinformatics 21:2787–8) predictions to form a combined method, so-called consensus database mining (CDM) (Sen TZ, Cheng H, Kloczkowski A, Jernigan RL (2006) A Consensus Data Mining secondary structure prediction by combining GOR V and Fragment Database Mining. Protein Sci 15:2499–506). FDM mines the structural segments of PDB, and utilizes structural information from the matching sequence fragments for the prediction of protein secondary structures. By combining it with the GOR V secondary structure prediction method, which is based on information theory and Bayesian statistics, coupled with evolutionary information from multiple sequence alignments (MSA), our CDM method guarantees improved accuracies of prediction. Additionally, with the constant growth in the number of new protein structures and folds in the PDB, the accuracy of the CDM method is clearly expected to increase in future. We have developed a publicly available CDM server (Cheng H, Sen TZ, Jernigan RL, Kloczkowski A (2007) Consensus Data Mining (CDM) Protein Secondary Structure Prediction Server: combining GOR V and Fragment Database Mining (FDM). Bioinformatics 23:2628–30) (http://gor.bb.iastate.edu/cdm) for protein secondary structure prediction.
- Research Article
65
- 10.1016/j.knosys.2016.11.015
- Nov 17, 2016
- Knowledge-Based Systems
Protein secondary structure prediction by using deep learning method
- Research Article
117
- 10.1186/1471-2105-8-201
- Jun 14, 2007
- BMC Bioinformatics
BackgroundStructural properties of proteins such as secondary structure and solvent accessibility contribute to three-dimensional structure prediction, not only in the ab initio case but also when homology information to known structures is available. Structural properties are also routinely used in protein analysis even when homology is available, largely because homology modelling is lower throughput than, say, secondary structure prediction. Nonetheless, predictors of secondary structure and solvent accessibility are virtually always ab initio.ResultsHere we develop high-throughput machine learning systems for the prediction of protein secondary structure and solvent accessibility that exploit homology to proteins of known structure, where available, in the form of simple structural frequency profiles extracted from sets of PDB templates. We compare these systems to their state-of-the-art ab initio counterparts, and with a number of baselines in which secondary structures and solvent accessibilities are extracted directly from the templates. We show that structural information from templates greatly improves secondary structure and solvent accessibility prediction quality, and that, on average, the systems significantly enrich the information contained in the templates. For sequence similarity exceeding 30%, secondary structure prediction quality is approximately 90%, close to its theoretical maximum, and 2-class solvent accessibility roughly 85%. Gains are robust with respect to template selection noise, and significant for marginal sequence similarity and for short alignments, supporting the claim that these improved predictions may prove beneficial beyond the case in which clear homology is available.ConclusionThe predictive system are publicly available at the address .
- Abstract
- 10.1016/j.bpj.2017.11.2393
- Feb 1, 2018
- Biophysical Journal
Combining Prediction of Protein Aggregation Propensities with Prediction of Other One-Dimensional Properties
- Conference Article
8
- 10.23919/indiacom54597.2022.9763114
- Mar 23, 2022
Protein secondary structure prediction is one of the hot research topics in computation biology. Accurate prediction of protein Secondary structures provide insights into drug discovery and design of enzyme. In addition, it plays an instrumental role in identifying structural-classes, protein-folds, and its three dimensional structure. However, the experimental determination of protein secondary structures is laborious and costly. It, therefore, hinges much on the use of computational techniques for prediction of secondary structures. In recent years, deep neural networks have been used extensively for protein secondary structure prediction. However, the deep learning models focusing on extracting local dependencies of a protein sequence face difficulties in effectively extracting non-local dependencies. Although LSTM recurrent neural network solved the problem of handling long range dependencies, these models suffer from vanishing gradients, exploding gradients and shallow layers. Moreover, these models fail to capture the dependencies that are very long. In this paper, we propose Attention augmented deep CNN-LSTM method to circumvent issues faced in LSTM RNNs. Our proposed model is able to efficiently capture both local and long range dependencies for enhancing the prediction of secondary structures. Experiments were conducted on CB6133, CB513, CASP10 and CASP11 benchmark datasets. The experimental results indicate that the performance of our method is better than the baseline methods.
- Research Article
21
- 10.1093/bioinformatics/9.2.147
- Jan 1, 1993
- Bioinformatics
We have studied the prediction of globular protein secondary structures by neural networks. Protein secondary structures are allocated to amino acid residues using Kabsch and Sander's dictionary of protein secondary structures and the neural network is taught the protein secondary structures. The input layer of the neural network allows sequences of residues including 20 amino acids, chain break, B, X and Z. We consider classifying secondary structures into groups of 3, 4 and 8. In each case, we calculate the percentage of correct predictions. We discuss the effect of overlearning on the protein secondary structure prediction. In addition, we include the application of a neural network with a modular architecture to prediction of protein secondary structures. We compare the results from neural networks with a modular architecture and with a simple three-layer structure.
- Research Article
7
- 10.1371/journal.pone.0254555
- Jul 14, 2021
- PloS one
The secondary structure prediction (SSP) of proteins has long been an essential structural biology technique with various applications. Despite its vital role in many research and industrial fields, in recent years, as the accuracy of state-of-the-art secondary structure predictors approaches the theoretical upper limit, SSP has been considered no longer challenging or too challenging to make advances. With the belief that the substantial improvement of SSP will move forward many fields depending on it, we conducted this study, which focused on three issues that have not been noticed or thoroughly examined yet but may have affected the reliability of the evaluation of previous SSP algorithms. These issues are all about the sequence homology between or within the developmental and evaluation datasets. We thus designed many different homology layouts of datasets to train and evaluate SSP prediction models. Multiple repeats were performed in each experiment by random sampling. The conclusions obtained with small experimental datasets were verified with large-scale datasets using state-of-the-art SSP algorithms. Very different from the long-established assumption, we discover that the sequence homology between query datasets for training, testing, and independent tests exerts little influence on SSP accuracy. Besides, the sequence homology redundancy between or within most datasets would make the accuracy of an SSP algorithm overestimated, while the redundancy within the reference dataset for extracting predictive features would make the accuracy underestimated. Since the overestimating effects are more significant than the underestimating effect, the accuracy of some SSP methods might have been overestimated. Based on the discoveries, we propose a rigorous procedure for developing SSP algorithms and making reliable evaluations, hoping to bring substantial improvements to future SSP methods and benefit all research and application fields relying on accurate prediction of protein secondary structures.
- Book Chapter
27
- 10.1007/978-3-319-12883-2_19
- Nov 30, 2014
Correct prediction of secondary and tertiary structure of proteins is one of the major challenges in bioinformatics/computational biological research. Predicting the correct secondary structure is the key to predict a good/satisfactory tertiary structure of the protein which not only helps in prediction of protein function but also in prediction of sub-cellular localization. This chapter aims to explain the different algorithms and methodologies, which are used in secondary structure prediction. Similarly, tertiary structure prediction has also emerged as one of developing areas of bioinformatics/computational biological research owing to the large gap between the available number of protein sequences and the known experimentally solved structures. Because of time and cost intensive experimental methods, experimentally determined structures are not available for vast majority of the available protein sequences present in public domain databases. The primary aim of this chapter is to offer a detailed conceptual insight to the algorithms used for protein secondary and tertiary structure prediction. This chapter systematically illustrates flowchart for selecting the most accurate prediction algorithm among different categories for the target sequence against three categories of tertiary structure prediction methods. Out of the three methods, homology modeling which is considered as most reliable method is discussed in detail followed by strengths and limitations for each of these categories. This chapter also explains different practical and conceptual problems, obstructing the high accuracy of the protein structure in each of the steps for all the three methods of tertiary structure prediction. The popular hybrid methodologies which further club together a number of features such as structural alignments, solvent accessibility and secondary structure information are also discussed. Moreover, this chapter elucidates about the Meta-servers that generate consensus result from many servers to build a protein model of high accuracy. Lastly, scope for further research in order to bridge existing gaps and for developing better secondary and tertiary structure prediction algorithms is also highlighted.
- Conference Article
5
- 10.1109/cibcb.2016.7758118
- Oct 1, 2016
In this paper, we propose an ab initio two-stage protein secondary structure (PSS) prediction model through a novel framework of PSS transition site prediction by using Artificial Neural Networks (ANNs) and Genetic Programming (GP). In the proposed classifier, protein sequences are encoded by new amino acid encoding schemes derived from genetic Codon mappings, Clustering and Information theory. In the first stage, sequence segments are mapped to regions in the Ramachandran map (2D-plot), and weight scores are computed by using statistical information derived from clusters. In addition, score vectors are constructed for the mapped regions using the weight scores and PSS transition sites. The score vectors have fewer dimensions compared to those of commonly used encoding schemes and protein profile. In the second stage, a two-tier classifier is employed based on an ANN and a GP method. The performance of the two-stage classifier is compared to the state-of-the-art cascaded Machine Learning methods which commonly employ ANNs. The prediction method is examined with the latest dataset of nonhomologous protein sequences, PISCES [1]. The experimental results and statistical analyses indicate a significantly higher distribution of Q 3 scores, approximately 7% with p-value < 0.001, in comparison to that of cascaded ANN architectures. PSS transition sites are valuable information about the topological property of protein sequences and incorporating the information improves the overall performance of the PSS prediction model.
- Research Article
3
- 10.1371/journal.pone.0254555.r004
- Jul 14, 2021
- PLoS ONE
The secondary structure prediction (SSP) of proteins has long been an essential structural biology technique with various applications. Despite its vital role in many research and industrial fields, in recent years, as the accuracy of state-of-the-art secondary structure predictors approaches the theoretical upper limit, SSP has been considered no longer challenging or too challenging to make advances. With the belief that the substantial improvement of SSP will move forward many fields depending on it, we conducted this study, which focused on three issues that have not been noticed or thoroughly examined yet but may have affected the reliability of the evaluation of previous SSP algorithms. These issues are all about the sequence homology between or within the developmental and evaluation datasets. We thus designed many different homology layouts of datasets to train and evaluate SSP prediction models. Multiple repeats were performed in each experiment by random sampling. The conclusions obtained with small experimental datasets were verified with large-scale datasets using state-of-the-art SSP algorithms. Very different from the long-established assumption, we discover that the sequence homology between query datasets for training, testing, and independent tests exerts little influence on SSP accuracy. Besides, the sequence homology redundancy between or within most datasets would make the accuracy of an SSP algorithm overestimated, while the redundancy within the reference dataset for extracting predictive features would make the accuracy underestimated. Since the overestimating effects are more significant than the underestimating effect, the accuracy of some SSP methods might have been overestimated. Based on the discoveries, we propose a rigorous procedure for developing SSP algorithms and making reliable evaluations, hoping to bring substantial improvements to future SSP methods and benefit all research and application fields relying on accurate prediction of protein secondary structures.
- Research Article
3
- 10.1007/s13721-021-00304-8
- Apr 30, 2021
- Network Modeling Analysis in Health Informatics and Bioinformatics
Proteins form the basis of all major life processes that sustain life. The functionality of a protein is a direct consequence of its underlying structure. Protein structure prediction thus serves to ascertain the function of similar or dissimilar proteins, accordingly. Secondary structure prediction paves way for 3D structures that eventually decides protein properties. It also aims to facilitate probable structures for proteins whose structures remain undiscovered. Although experimental approaches have been quite efficient in extracting protein secondary structure from its amino acid sequence, yet it is often cumbersome and time intensive to achieve it in vitro. Hence, computational approaches are required to predict secondary structures for the diverse amino acids constituting these proteins. However, the available computational models fail to register good prediction accuracy due to inadequate modelling of sequence-structure relationship. Also, the dearth of global exploration-based methods further makes them ineffective in catering to the evolving proteomic data. Accordingly, PSO (Particle swarm optimization) has been explored to propose a neural network model for protein secondary structure prediction (PSSP). Six standard datasets namely- PSS504, RS126, EVA6, CB396, Manesh and CB513 have been utilized for the training and testing of the neural network. The proposed model is evaluated on the basis of its Q3 accuracy, precision, and recall. The 10, 20, 30 and 40 fold cross validation in combination with sensitivity analysis and has been carried out for verification of results. The proposed model is found to outperform most of the existing models by demonstrating a better average Q3 accuracy lying above 81% for PSSP.
- Conference Article
- 10.1109/icmlc.2005.1527519
- Jan 1, 2005
Prediction of protein secondary structure has not been resolved in bioinformatics for over thirty years. Numerous methods have been developed to conquer this problem so far, but the results of most methods are not satisfactory. The Chou-Fasman method is simple, straightforward, and instructive to biologists and chemists, although its prediction accuracy is not as good as some newly developed learning algorithms such as neural network and SVM. This article presents the first attempt to predict protein secondary structure by means of PBIL algorithm. The idea is to predict the secondary structure by statistically optimal functions based on rules derived from the sequence-structure data. These rules, as part of optimal or tabu functions, are quite important to the success of this algorithm. The concept of probability of secondary structure corresponding to amino acids in sequence has been successfully applied to calculating the optimal function, providing a new approach to prediction of protein secondary structure.
- Conference Article
4
- 10.1109/cibcb.2015.7300327
- Aug 1, 2015
In this paper, we evaluated the performance of an evolutionary-based protein secondary structure (PSS) prediction model which uses the information of amino acid sequences extracted by a clustering technique. The dimension of the classifier's inputs is reduced using a k-means clustering method on sequence segments. The proposed PSS classifier is based on a Genetic Programming (GP) approach that uses IF rules for a multi-target classifier. The GP classifier is evaluated by using protein sequences and the sequence information obtained from the k-means clustering. The GP prediction model's performance is compared with those of feed-forward artificial neural networks (ANNs) and support vector machines (SVMs). The prediction methods are examined with two protein datasets RS126 and CB513. The performance of the three classification models are measured according to Q 3 and segment overlap (SOV) scores. The prediction models which use clustered data result in average 2% higher prediction accuracy than those using sequence data. In addition, the experimental results indicate the GP model's prediction scores are in average 3% higher than those of the ANN and SVMs models when amino acid sequences or clustered information are explored.
- Abstract
- 10.1016/j.bpj.2016.11.1100
- Feb 1, 2017
- Biophysical Journal
Prediction of Protein Aggregation Propensities using GOR Method
- New
- Research Article
- 10.3389/fbioe.2025.1728779
- Nov 6, 2025
- Frontiers in Bioengineering and Biotechnology
- New
- Research Article
- 10.3389/fbioe.2025.1703902
- Nov 6, 2025
- Frontiers in Bioengineering and Biotechnology
- New
- Research Article
- 10.3389/fbioe.2025.1655295
- Nov 4, 2025
- Frontiers in Bioengineering and Biotechnology
- New
- Research Article
- 10.3389/fbioe.2025.1664917
- Nov 3, 2025
- Frontiers in Bioengineering and Biotechnology
- New
- Research Article
- 10.3389/fbioe.2025.1693678
- Nov 3, 2025
- Frontiers in Bioengineering and Biotechnology
- New
- Research Article
- 10.3389/fbioe.2025.1657653
- Nov 3, 2025
- Frontiers in Bioengineering and Biotechnology
- New
- Research Article
- 10.3389/fbioe.2025.1641709
- Nov 3, 2025
- Frontiers in Bioengineering and Biotechnology
- Research Article
- 10.3389/fbioe.2025.1702899
- Oct 31, 2025
- Frontiers in Bioengineering and Biotechnology
- Research Article
- 10.3389/fbioe.2025.1646500
- Oct 29, 2025
- Frontiers in Bioengineering and Biotechnology
- Research Article
- 10.3389/fbioe.2025.1656421
- Oct 29, 2025
- Frontiers in Bioengineering and Biotechnology
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.