Prediction of secondary structural content of proteins from their amino acid composition alone. II. The paradox with secondary structural class.

  • Abstract
  • References
  • Citations
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

The success rates reported for secondary structural class prediction with different methods are contradictory. On one side, the problem of recognizing the secondary structural class of a protein knowing only its amino acid composition appears completely solved by simply applying jury decision with an elliptically scaled distance function. Chou and coworkers repeatedly (see Crit. Rev. Biochem. Mol. Biol. 30:275-349, 1995) published prediction accuracies near 100%. On the other hand, traditional secondary structure prediction techniques achieve success rates of about 70% for the secondary structural state per residue and about 75% for structural class only with extensive input information (full sequence of the query protein, its amino acid composition and length, multiple alignments with homologous sequences). In this article, we resolve the paradox and consider (1) the question of the secondary structural class definition, (2) the role of the representativity of the test set of protein tertiary structure for the current state of the Protein Data Bank (PDB); and (3) we estimate the real impact of amino acid composition on secondary structural class. We formulate three objective criteria for a reasonable definition of secondary structural classes and show that only the criterion of Nakashima et al. (J. Biochem. 99:153-162, 1986) complies with all of them. Only this definition matches the distribution of secondary structural content in representative PDB subsets, whereas other criteria leave many proteins (up to 65% of all PDB entries) simply unassigned. We review critically specialized secondary-structural class prediction methods, especially those of Chou and coworkers, which claim almost 100% accuracy using only amino acid composition, and resolve the paradox that these prediction accuracies are better than those from secondary structure predictions from multiple alignments. We show (i) that these techniques rely on a preselection of test sets which removes irregular proteins and other proteins without any class assignment (about 35% of all PDB entries); and (ii) that even for preselected representative test sets, the success rate drops to 60% and lower for a 4-type classification (alpha, beta, alpha + beta, alpha/beta). The prediction accuracies fall to about 50% if the secondary structural class definition of Nakashima et al. is applied and only few irregular proteins are preselected and removed from automatically generated, representative subsets of the PDB. We have applied two new vector decomposition methods for secondary structural content prediction from amino acid composition alone, with and without consideration of amino acid compositional coupling in the learning set of tertiary structures respectively, to the problem of class prediction and achieve about 60% correct assignment among four classes (alpha, beta, mixed, irregular) as well as single sequence-based secondary structure prediction methods like GORIII and COMBI. Our results demonstrate that 60% correctness is the upper limit for a 4-type class prediction from amino acid composition alone for an unknown query protein and that consideration of compositional coupling does not improve the prediction success. The prediction program SSCP offering secondary structural class assignment for query compositions and sequences has been made available as a World Wide Web and E-mail service.

ReferencesShowing 10 of 32 papers
  • Cite Count Icon 1185
  • 10.3109/10409239509083488
Prediction of protein structural classes.
  • Jan 1, 1995
  • Critical Reviews in Biochemistry and Molecular Biology
  • Kuo-Chen Chou + 1 more

  • Cite Count Icon 66
The DEF data base of sequence based protein fold class predictions.
  • Sep 1, 1994
  • Nucleic acids research
  • M Reczko + 1 more

  • Cite Count Icon 10
  • 10.1007/bf01886788
An eigenvalue-eigenvector approach to predicting protein folding types
  • Jul 1, 1995
  • Journal of Protein Chemistry
  • Chun-Ting Zhang + 1 more

  • Cite Count Icon 9475
  • 10.1016/s0022-2836(77)80200-3
The protein data bank: A computer-based archival file for macromolecular structures
  • May 1, 1977
  • Journal of Molecular Biology
  • Frances C Bernstein + 8 more

  • Open Access Icon
  • Cite Count Icon 72
  • 10.1111/j.1432-1033.1992.tb17067.x
A correlation-coefficient method to predicting protein-structural classes from amino acid compositions.
  • Jul 1, 1992
  • European Journal of Biochemistry
  • Kuo‐Chen Chou + 1 more

  • Cite Count Icon 95
  • 10.1016/0959-440x(95)80099-9
Protein secondary structure prediction
  • Jun 1, 1995
  • Current Opinion in Structural Biology
  • Geoffrey J Barton

  • Cite Count Icon 752
  • 10.1016/0022-2836(90)90154-e
Improvements in protein secondary structure prediction by an enhanced neural network
  • Jul 1, 1990
  • Journal of Molecular Biology
  • D.G Kneller + 2 more

  • Cite Count Icon 19
  • 10.1093/protein/8.6.505
Accurate prediction of protein secondary structural class with fuzzy structural vectors.
  • Jan 1, 1995
  • Protein engineering
  • Jorma Boberg + 2 more

  • Cite Count Icon 49
  • 10.1002/bip.360241011
Amino acid composition and hydrophobicity patterns of protein domains correlate with their structures.
  • Oct 1, 1985
  • Biopolymers
  • Robert P Sheridan + 4 more

  • Cite Count Icon 119
  • 10.1093/protein/6.8.849
Quantification of secondary structure prediction improvement using multiple alignments.
  • Jan 1, 1993
  • "Protein Engineering, Design and Selection"
  • Jonathan M Levin + 3 more

CitationsShowing 10 of 77 papers
  • Research Article
  • Cite Count Icon 386
  • 10.1023/a:1020713915365
An intriguing controversy over protein structural class prediction.
  • Nov 1, 1998
  • Journal of Protein Chemistry
  • Guo-Ping Zhou

A recent report by Bahar et al. [(1997), Proteins 29, 172-185] indicates that the coupling effects among different amino acid components as originally formulated by K. C. Chou [(1995), Proteins 21, 319-344] are important for improving the prediction of protein structural classes. These authors have further proposed a compact lattice model to illuminate the physical insight contained in the component-coupled algorithm. However, a completely opposite result was concluded by Eisenhaber et al. [(1996), Proteins 25, 169 179], using a different dataset constructed according to their definition. To address such an intriguing controversy, tests were conducted by various approaches for the datasets from an objective database, the SCOP database [Murzin et al. (1995), J. Mol. Biol. 247, 536-540]. The results obtained by both self-consistency and jackknife tests indicate that the overall rates of correct prediction by the algorithm incorporating the coupling effect among different amino acid components are significantly higher than those by the algorithms without counting such an effect. This is fully consistent with the physical reality that the folding of a protein is the result of a collective interaction among its constituent amino acid residues, and hence the coupling effects of different amino acid components must be incorporated in order to improve the prediction quality. It was found by a revisiting the calculation procedures by Eisenhaber et al. that there was a conceptual mistake in constructing the structural class datasets and a systematic mistake in applying the component-coupled algorithm. These findings are informative for understanding and utilizing the component-coupled algorithm to study the structural classes of proteins.

  • Open Access Icon
  • PDF Download Icon
  • Research Article
  • Cite Count Icon 31
  • 10.1155/s1110724303209232
Proteomics in Vaccinology and Immunobiology: An Informatics Perspective of the Immunone
  • Dec 4, 2003
  • Journal of Biomedicine and Biotechnology
  • Irini A Doytchinova + 2 more

The postgenomic era, as manifest, inter alia, by proteomics, offers unparalleled opportunities for the efficient discovery of safe, efficacious, and novel subunit vaccines targeting a tranche of modern major diseases. A negative corollary of this opportunity is the risk of becoming overwhelmed by this embarrassment of riches. Informatics techniques, working to address issues of both data management and through prediction to shortcut the experimental process, can be of enormous benefit in leveraging the proteomic revolution.In this disquisition, we evaluate proteomic approaches to the discovery of subunit vaccines, focussing on viral, bacterial, fungal, and parasite systems. We also adumbrate the impact that proteomic analysis of host-pathogen interactions can have. Finally, we review relevant methods to the prediction of immunome, with special emphasis on quantitative methods, and the subcellular localization of proteins within bacteria.

  • Open Access Icon
  • Research Article
  • Cite Count Icon 43
  • 10.1182/blood.v100.3.1026
Deletion of leucine 61 in glucose-6-phosphate dehydrogenase leads to chronic nonspherocytic anemia, granulocyte dysfunction, and increased susceptibility to infections
  • Jul 18, 2002
  • Blood
  • Robin Van Bruggen

Deletion of leucine 61 in glucose-6-phosphate dehydrogenase leads to chronic nonspherocytic anemia, granulocyte dysfunction, and increased susceptibility to infections

  • Research Article
  • Cite Count Icon 70
  • 10.1182/blood.v94.9.2955
Molecular Basis and Enzymatic Properties of Glucose 6-Phosphate Dehydrogenase Volendam, Leading to Chronic Nonspherocytic Anemia, Granulocyte Dysfunction, and Increased Susceptibility to Infections
  • Dec 14, 2020
  • Blood
  • Dirk Roos + 15 more

Molecular Basis and Enzymatic Properties of Glucose 6-Phosphate Dehydrogenase Volendam, Leading to Chronic Nonspherocytic Anemia, Granulocyte Dysfunction, and Increased Susceptibility to Infections

  • Open Access Icon
  • Research Article
  • Cite Count Icon 19
  • 10.1080/15384047.2015.1056407
Anti-proliferative effect on a colon adenocarcinoma cell line exerted by a membrane disrupting antimicrobial peptide KL15
  • Jul 6, 2015
  • Cancer Biology & Therapy
  • Yu-Ching Chen + 3 more

The antimicrobial and anticancer activities of an antimicrobial peptide (AMP) KL15 obtained through in silico modification on the sequences of 2 previously identified bacteriocins m2163 and m2386 from Lactobacillus casei ATCC 334 by us have been studied. While significant bactericidal effect on the pathogenic bacteria Listeria, Escherichia, Bacillus, Staphylococcus, Enterococcus is exerted by KL15, the AMP can also kill 2 human adenocarcinoma cells SW480 and Caco-2 with measured IC50 as 50 μg/ml or 26.3 μM. However, the IC50 determined for KL15 on killing the normal human mammary epithelial cell H184B5F5/M10 is 150 μg/ml. The conformation of KL15 dissolved in 50% 2,2,2-trifluroroethanol or in 2 large unilamellar vesicle systems determined by circular dichroism spectroscopy appears to be helical. Further, the cell membrane permeability of treated SW480 cells by KL15 appears to be significantly enhanced as studied by both flow cytometry and confocal microscopy. As observed under a scanning electron microscope, the morphology of treated SW480 cells is also significantly changed as treating time by 80 μg/ml KL15 is increased. KL15 appears to be able to pierce the cell membrane of treated SW480 cells so that numerous porous structures are generated and observable. Therefore, KL15 is likely to kill the treated SW480 cells through the necrotic pathway similar to some recently identified AMPs by others.

  • Open Access Icon
  • Research Article
  • Cite Count Icon 3
  • 10.1002/prot.22788
Fold homology detection using sequence fragment composition profiles of proteins
  • Aug 16, 2010
  • Proteins: Structure, Function, and Bioinformatics
  • Armando D Solis + 1 more

The effectiveness of sequence alignment in detecting structural homology among protein sequences decreases markedly when pairwise sequence identity is low (the so-called "twilight zone" problem of sequence alignment). Alternative sequence comparison strategies able to detect structural kinship among highly divergent sequences are necessary to address this need. Among them are alignment-free methods, which use global sequence properties (such as amino acid composition) to identify structural homology in a rapid and straightforward way. We explore the viability of using tetramer sequence fragment composition profiles in finding structural relationships that lie undetected by traditional alignment. We establish a strategy to recast any given protein sequence into a tetramer sequence fragment composition profile, using a series of amino acid clustering steps that have been optimized for mutual information. Our method has the effect of compressing the set of 160,000 unique tetramers (if using the 20-letter amino acid alphabet) into a more tractable number of reduced tetramers (approximately 15-30), so that a meaningful tetramer composition profile can be constructed. We test remote homology detection at the topology and fold superfamily levels using a comprehensive set of fold homologs, culled from the CATH database that share low pairwise sequence similarity. Using the receiver-operating characteristic measure, we demonstrate potentially significant improvement in using information-optimized reduced tetramer composition, over methods relying only on the raw amino acid composition or on traditional sequence alignment, in homology detection at or below the "twilight zone".

  • Research Article
  • Cite Count Icon 116
  • 10.1002/(sici)1097-0134(19980401)31:1<97::aid-prot8>3.0.co;2-e
Prediction and classification of domain structural classes
  • Apr 1, 1998
  • Proteins: Structure, Function, and Genetics
  • Kou-Chen Chou + 3 more

Can the coupling effect among different amino acid components be used to improve the prediction of protein structural classes? The answer is yes according to the study by Chou and Zhang (Crit. Rev. Biochem. Mol. Biol. 30:275-349, 1995), but a completely opposite conclusion was drawn by Eisenhaber et al. when using a different dataset constructed by themselves (Proteins 25:169-179, 1996). To resolve such a perplexing problem, predictions were performed by various approaches for the datasets from an objective database, the SCOP database (Murzin, Brenner, Hubbard, and Chothia. J. Mol. Biol. 247:536-540, 1995). According to SCOP, the classification of structural classes for protein domains is based on the evolutionary relationship and on the principles that govern the 3D structure of proteins, and hence is more natural and reliable. The results from both resubstitution tests and jackknife tests indicate that the overall rates of correct prediction by the algorithm incorporated with the coupling effect among different amino acid components are significantly higher than those by the algorithms without using such an effect. It is elucidated through an analysis that the main reasons for Eisenhaber et al. to have reached an opposite conclusion are the result of (1) misusing the component-coupled algorithm, and (2) using a conceptually incorrect rule to classify protein structural classes. The formulation and analysis presented in this article are conducive to clarify these problems, helping correctly to apply the prediction algorithm and interpret the results.

  • Research Article
  • Cite Count Icon 126
  • 10.1002/(sici)1097-0134(199710)29:2<172::aid-prot5>3.0.co;2-f
Understanding the recognition of protein structural classes by amino acid composition
  • Oct 1, 1997
  • Proteins: Structure, Function, and Genetics
  • Ivet Bahar + 3 more

Knowledge of amino acid composition, alone, is verified here to be sufficient for recognizing the structural class, alpha, beta, alpha + beta, or alpha/beta of a given protein with an accuracy of 81%. This is supported by results from exhaustive enumerations of all conformations for all sequences of simple, compact lattice models consisting of two types (hydrophobic and polar) of residues. Different compositions exhibit strong affinities for certain folds. Within the limits of validity of the lattice models, two factors appear to determine the choice of particular folds: 1) the coordination numbers of individual sites and 2) the size and geometry of non-bonded clusters. These two properties, collectively termed the distribution of non-bonded contacts, are quantitatively assessed by an eigenvalue analysis of the so-called Kirchhoff or adjacency matrices obtained by considering the non-bonded interactions on a lattice. The analysis permits the identification of conformations that possess the same distribution of non-bonded contacts. Furthermore, some distributions of non-bonded contacts are favored entropically, due to their high degeneracies. Thus, a competition between enthalpic and entropic effects is effective in determining the choice of a distribution for a given composition. Based on these findings, an analysis of non-bonded contacts in protein structures was made. The analysis shows that proteins belonging to the four distinct folding classes exhibit significant differences in their distributions of non-bonded contacts, which more directly explains the success in predicting structural class from amino acid composition.

  • Research Article
  • Cite Count Icon 45
  • 10.1016/s1476-9271(02)00087-7
Prediction of protein structural classes by a new measure of information discrepancy
  • Jul 1, 2003
  • Computational Biology and Chemistry
  • Lixia Jin + 2 more

Prediction of protein structural classes by a new measure of information discrepancy

  • Research Article
  • Cite Count Icon 45
  • 10.1002/prot.20821
Prediction of protein secondary structure content using amino acid composition and evolutionary information
  • Dec 12, 2005
  • Proteins: Structure, Function, and Bioinformatics
  • Soyoung Lee + 2 more

Knowing protein structure and inferring its function from the structure are one of the main issues of computational structural biology, and often the first step is studying protein secondary structure. There have been many attempts to predict protein secondary structure contents. Previous attempts assumed that the content of protein secondary structure can be predicted successfully using the information on the amino acid composition of a protein. Recent methods achieved remarkable prediction accuracy by using the expanded composition information. The overall average error of the most successful method is 3.4%. Here, we demonstrate that even if we only use the simple amino acid composition information alone, it is possible to improve the prediction accuracy significantly if the evolutionary information is included. The idea is motivated by the observation that evolutionarily related proteins share the similar structure. After calculating the homolog-averaged amino acid composition of a protein, which can be easily obtained from the multiple sequence alignment by running PSI-BLAST, those 20 numbers are learned by a multiple linear regression, an artificial neural network and a support vector regression. The overall average error of method by a support vector regression is 3.3%. It is remarkable that we obtain the comparable accuracy without utilizing the expanded composition information such as pair-coupled amino acid composition. This work again demonstrates that the amino acid composition is a fundamental characteristic of a protein. It is anticipated that our novel idea can be applied to many areas of protein bioinformatics where the amino acid composition information is utilized, such as subcellular localization prediction, enzyme subclass prediction, domain boundary prediction, signal sequence prediction, and prediction of unfolded segment in a protein sequence, to name a few.

Similar Papers
  • Research Article
  • Cite Count Icon 97
  • 10.1002/(sici)1097-0134(199606)25:2<157::aid-prot2>3.0.co;2-f
Prediction of secondary structural content of proteins from their amino acid composition alone. I. New analytic vector decomposition methods.
  • Jun 1, 1996
  • Proteins: Structure, Function, and Genetics
  • Frank Eisenhaber + 3 more

The predictive limits of the amino acid composition for the secondary structural content (percentage of residues in the secondary structural states helix, sheet, and coil) in proteins are assessed quantitatively. For the first time, techniques for prediction of secondary structural content are presented which rely on the amino acid composition as the only information on the query protein. In our first method, the amino acid composition of an unknown protein is represented by the best (in a least square sense) linear combination of the characteristic amino acid compositions of the three secondary structural types computed from a learning set of tertiary structures. The second technique is a generalization of the first one and takes into account also possible compositional couplings between any two sorts of amino acids. Its mathematical formulation results in an eigenvalue/eigenvector problem of the second moment matrix describing the amino acid compositional fluctuations of secondary structural types in various proteins of a learning set. Possible correlations of the principal directions of the eigenspaces with physical properties of the amino acids were also checked. For example, the first two eigenvectors of the helical eigenspace correlate with the size and hydrophobicity of the residue types respectively. As learning and test sets of tertiary structures, we utilized representative, automatically generated subsets of Protein Data Bank (PDB) consisting of non-homologous protein structures at the resolution thresholds < or = 1.8A, < or = 2.0A, < or = 2.5A, and < or = 3.0 A. We show that the consideration of compositional couplings improves prediction accuracy, albeit not dramatically. Whereas in the self-consistency test (learning with the protein to be predicted), a clear decrease of prediction accuracy with worsening resolution is observed, the jackknife test (leave the predicted protein out) yielded best results for the largest dataset (< or = 3.0A, almost no difference to the self-consistency test!), i.e., only this set, with more than 400 proteins, is sufficient for stable computation of the parameters in the prediction function of the second method. The average absolute error in predicting the fraction of helix, sheet, and coil from amino acid composition of the query protein are 13.7, 12.6, and 11.4%, respectively with r.m.s. deviations in the range of 8.6 divided by 11.8% for the 3.0 A dataset in a jackknife test. The absolute precision of the average absolute errors is in the range of 1 divided by 3% as measured for other representative subsets of the PDB. Secondary structural content prediction methods found in the literature have been clustered in accordance with their prediction accuracies. To our surprise, much more complex secondary structure prediction methods utilized for the same purpose of secondary structural content prediction achieve prediction accuracies very similar to those of the present analytic techniques, implying that all the information beyond the amino acid composition is, in fact, mainly utilized for positioning the secondary structural state in the sequence but not for determination of the overall number of residues in a secondary structural type. This result implies that higher prediction accuracies cannot be achieved relying solely on the amino acid composition of an unknown query protein as prediction input. Our prediction program SSCP has been made available as a World Wide Web and E-mail service.

  • Research Article
  • Cite Count Icon 27
  • 10.1016/s0032-3861(01)00425-6
Protein secondary structure prediction based on the GOR algorithm incorporating multiple sequence alignment information
  • Oct 25, 2001
  • Polymer
  • A Kloczkowski + 3 more

Protein secondary structure prediction based on the GOR algorithm incorporating multiple sequence alignment information

  • Book Chapter
  • Cite Count Icon 4
  • 10.1007/978-3-642-04759-6_5
Data Mining for Protein Secondary Structure Prediction
  • Jan 1, 2009
  • Haitao Cheng + 3 more

Accurate protein secondary structure prediction from the amino acid sequence is essential for almost all theoretical and experimental studies on protein structure and function. After a brief discussion of application of data mining for optimization of crystallization conditions for target proteins we show that data mining of structural fragments of proteins from known structures in the protein data bank (PDB) significantly improves the accuracy of secondary structure predictions. The original method was proposed by us a few years ago and was termed fragment database mining (FDM) (Cheng H, Sen TZ, Kloczkowski A, Margaritis D, Jernigan RL (2005) Prediction of protein secondary structure by mining structural fragment database. Polymer 46:4314–4321). This method gives excellent accuracy for predictions if similar sequence fragments are available in our library of structural fragments, but is less successful if such fragments are absent in the fragments database. Recently we have improved secondary structure predictions further by combining FDM with classical GOR V (Kloczkowski A, Ting KL, Jernigan RL, Garnier J (2002a) Combining the GOR V algorithm with evolutionary information for protein secondary structure prediction from amino acid sequence. Proteins 49:154–66; Sen TZ, Jernigan RL, Garnier J, Kloczkowski A (2005) GOR V server for protein secondary structure prediction. Bioinformatics 21:2787–8) predictions to form a combined method, so-called consensus database mining (CDM) (Sen TZ, Cheng H, Kloczkowski A, Jernigan RL (2006) A Consensus Data Mining secondary structure prediction by combining GOR V and Fragment Database Mining. Protein Sci 15:2499–506). FDM mines the structural segments of PDB, and utilizes structural information from the matching sequence fragments for the prediction of protein secondary structures. By combining it with the GOR V secondary structure prediction method, which is based on information theory and Bayesian statistics, coupled with evolutionary information from multiple sequence alignments (MSA), our CDM method guarantees improved accuracies of prediction. Additionally, with the constant growth in the number of new protein structures and folds in the PDB, the accuracy of the CDM method is clearly expected to increase in future. We have developed a publicly available CDM server (Cheng H, Sen TZ, Jernigan RL, Kloczkowski A (2007) Consensus Data Mining (CDM) Protein Secondary Structure Prediction Server: combining GOR V and Fragment Database Mining (FDM). Bioinformatics 23:2628–30) (http://gor.bb.iastate.edu/cdm) for protein secondary structure prediction.

  • Research Article
  • Cite Count Icon 377
  • 10.1093/protein/1.4.289
An algorithm for protein secondary structure prediction based on class prediction.
  • Jan 1, 1987
  • "Protein Engineering, Design and Selection"
  • G Deléage + 1 more

An algorithm has been developed to improve the success rate in the prediction of the secondary structure of proteins by taking into account the predicted class of the proteins. This method has been called the 'double prediction method' and consists of a first prediction of the secondary structure from a new algorithm which uses parameters of the type described by Chou and Fasman, and the prediction of the class of the proteins from their amino acid composition. These two independent predictions allow one to optimize the parameters calculated over the secondary structure database to provide the final prediction of secondary structure. This method has been tested on 59 proteins in the database (i.e. 10,322 residues) and yields 72% success in class prediction, 61.3% of residues correctly predicted for three states (helix, sheet and coil) and a good agreement between observed and predicted contents in secondary structure.

  • Book Chapter
  • Cite Count Icon 27
  • 10.1007/978-3-319-12883-2_19
Secondary and Tertiary Structure Prediction of Proteins: A Bioinformatic Approach
  • Nov 30, 2014
  • Minu Kesheri + 3 more

Correct prediction of secondary and tertiary structure of proteins is one of the major challenges in bioinformatics/computational biological research. Predicting the correct secondary structure is the key to predict a good/satisfactory tertiary structure of the protein which not only helps in prediction of protein function but also in prediction of sub-cellular localization. This chapter aims to explain the different algorithms and methodologies, which are used in secondary structure prediction. Similarly, tertiary structure prediction has also emerged as one of developing areas of bioinformatics/computational biological research owing to the large gap between the available number of protein sequences and the known experimentally solved structures. Because of time and cost intensive experimental methods, experimentally determined structures are not available for vast majority of the available protein sequences present in public domain databases. The primary aim of this chapter is to offer a detailed conceptual insight to the algorithms used for protein secondary and tertiary structure prediction. This chapter systematically illustrates flowchart for selecting the most accurate prediction algorithm among different categories for the target sequence against three categories of tertiary structure prediction methods. Out of the three methods, homology modeling which is considered as most reliable method is discussed in detail followed by strengths and limitations for each of these categories. This chapter also explains different practical and conceptual problems, obstructing the high accuracy of the protein structure in each of the steps for all the three methods of tertiary structure prediction. The popular hybrid methodologies which further club together a number of features such as structural alignments, solvent accessibility and secondary structure information are also discussed. Moreover, this chapter elucidates about the Meta-servers that generate consensus result from many servers to build a protein model of high accuracy. Lastly, scope for further research in order to bridge existing gaps and for developing better secondary and tertiary structure prediction algorithms is also highlighted.

  • Book Chapter
  • Cite Count Icon 5
  • 10.1016/b978-8-1312-2297-3.50005-9
Chapter 5 - Protein Structure Prediction
  • Jan 1, 2010
  • Protein Bioinformatics
  • M Michael Gromiha

Chapter 5 - Protein Structure Prediction

  • Research Article
  • 10.1007/s0089490050078
Analyzing the Interplay Between Secondary and Tertiary Structure Predictions in Folding Simulations with a Genetic Algorithm
  • Apr 1, 1999
  • Journal of Molecular Modeling
  • Thomas Dandekar + 1 more

Three different strategies to tackle mispredictions from incorrect secondary structure prediction are analysed using 21 small proteins (22-121 amino acids; 1-6 secondary structure elements) with known three dimensional structures: (1) Testing accuracy of different secondary structure predictions and improving them by combinations, (2) correcting mispredictions exploiting protein folding simulations with a genetic algorithm and (3) applying and combining experimental data to refine predictions both for secondary structure and tertiary fold. We demonstrate that predictions from secondary structure prediction programs can be efficiently combined to reduce prediction errors from missed secondary structure elements. Further, up to two secondary structure elements (helices, strands) missed by secondary structure prediction were corrected by the genetic algorithm simulation. Finally, we show how input from experimental data is exploited to refine the predictions obtained.

  • Research Article
  • Cite Count Icon 6
  • 10.1007/s00894-013-1911-z
Distributions of amino acids suggest that certain residue types more effectively determine protein secondary structure
  • Aug 2, 2013
  • Journal of Molecular Modeling
  • S Saraswathi + 4 more

Exponential growth in the number of available protein sequences is unmatched by the slower growth in the number of structures. As a result, the development of efficient and fast protein secondary structure prediction methods is essential for the broad comprehension of protein structures. Computational methods that can efficiently determine secondary structure can in turn facilitate protein tertiary structure prediction, since most methods rely initially on secondary structure predictions. Recently, we have developed a fast learning optimized prediction methodology (FLOPRED) for predicting protein secondary structure (Saraswathi et al. in JMM 18:4275, 2012). Data are generated by using knowledge-based potentials combined with structure information from the CATH database. A neural network-based extreme learning machine (ELM) and advanced particle swarm optimization (PSO) are used with this data to obtain better and faster convergence to more accurate secondary structure predicted results. A five-fold cross-validated testing accuracy of 83.8 % and a segment overlap (SOV) score of 78.3 % are obtained in this study. Secondary structure predictions and their accuracy are usually presented for three secondary structure elements: α-helix, β-strand and coil but rarely have the results been analyzed with respect to their constituent amino acids. In this paper, we use the results obtained with FLOPRED to provide detailed behaviors for different amino acid types in the secondary structure prediction. We investigate the influence of the composition, physico-chemical properties and position specific occurrence preferences of amino acids within secondary structure elements. In addition, we identify the correlation between these properties and prediction accuracy. The present detailed results suggest several important ways that secondary structure predictions can be improved in the future that might lead to improved protein design and engineering.

  • Research Article
  • Cite Count Icon 132
  • 10.1002/pro.5560040214
Neural networks for secondary structure and structural class predictions
  • Feb 1, 1995
  • Protein Science
  • John‐Marc Chandonia + 1 more

A pair of neural network-based algorithms is presented for predicting the tertiary structural class and the secondary structure of proteins. Each algorithm realizes improvements in accuracy based on information provided by the other. Structural class prediction of proteins nonhomologous to any in the training set is improved significantly, from 62.3% to 73.9%, and secondary structure prediction accuracy improves slightly, from 62.26% to 62.64%. A number of aspects of neural network optimization and testing are examined. They include network overtraining and an output filter based on a rolling average. Secondary structure prediction results vary greatly depending on the particular proteins chosen for the training and test sets; consequently, an appropriate measure of accuracy reflects the more unbiased approach of "jackknife" cross-validation (testing each protein in the data-base individually).

  • Research Article
  • Cite Count Icon 127
  • 10.1002/(sici)1097-0134(19990515)35:3<293::aid-prot3>3.0.co;2-l
New methods for accurate prediction of protein secondary structure
  • May 15, 1999
  • Proteins: Structure, Function, and Genetics
  • John-Marc Chandonia + 1 more

A primary and a secondary neural network are applied to secondary structure and structural class prediction for a database of 681 non-homologous protein chains. A new method of decoding the outputs of the secondary structure prediction network is used to produce an estimate of the probability of finding each type of secondary structure at every position in the sequence. In addition to providing a reliable estimate of the accuracy of the predictions, this method gives a more accurate Q3 (74.6%) than the cutoff method which is commonly used. Use of these predictions in jury methods improves the Q3 to 74.8%, the best available at present. On a database of 126 proteins commonly used for comparison of prediction methods, the jury predictions are 76.6% accurate. An estimate of the overall Q3 for a given sequence is made by averaging the estimated accuracy of the prediction over all residues in the sequence. As an example, the analysis is applied to the target beta-cryptogein, which was a difficult target for ab initio predictions in the CASP2 study; it shows that the prediction made with the present method (62% of residues correct) is close to the expected accuracy (66%) for this protein. The larger database and use of a new network training protocol also improve structural class prediction accuracy to 86%, relative to 80% obtained previously. Secondary structure content is predicted with accuracy comparable to that obtained with spectroscopic methods, such as vibrational or electronic circular dichroism and Fourier transform infrared spectroscopy.

  • Research Article
  • Cite Count Icon 11
  • 10.1002/(sici)1097-0134(19990515)35:3<293::aid-prot3>3.3.co;2-c
New methods for accurate prediction of protein secondary structure
  • May 15, 1999
  • Proteins: Structure, Function, and Genetics
  • John‐Marc Chandonia + 1 more

A primary and a secondary neural network are applied to secondary structure and structural class prediction for a database of 681 non-homologous protein chains. A new method of decoding the outputs of the secondary structure prediction network is used to produce an estimate of the probability of finding each type of secondary structure at every position in the sequence. In addition to providing a reliable estimate of the accuracy of the predictions, this method gives a more accurate Q3 (74.6%) than the cutoff method which is commonly used. Use of these predictions in jury methods improves the Q3 to 74.8%, the best available at present. On a database of 126 proteins commonly used for comparison of prediction methods, the jury predictions are 76.6% accurate. An estimate of the overall Q3 for a given sequence is made by averaging the estimated accuracy of the prediction over all residues in the sequence. As an example, the analysis is applied to the target β-cryptogein, which was a difficult target for ab initio predictions in the CASP2 study; it shows that the prediction made with the present method (62% of residues correct) is close to the expected accuracy (66%) for this protein. The larger database and use of a new network training protocol also improve structural class prediction accuracy to 86%, relative to 80% obtained previously. Secondary structure content is predicted with accuracy comparable to that obtained with spectroscopic methods, such as vibrational or electronic circular dichroism and Fourier transform infrared spectroscopy. Proteins 1999;35:293–306. © 1999 Wiley-Liss, Inc.

  • Research Article
  • Cite Count Icon 11
  • 10.3389/fbioe.2022.901018
Prediction of protein secondary structure based on an improved channel attention and multiscale convolution module.
  • Jul 22, 2022
  • Frontiers in bioengineering and biotechnology
  • Xin Jin + 4 more

Prediction of the protein secondary structure is a key issue in protein science. Protein secondary structure prediction (PSSP) aims to construct a function that can map the amino acid sequence into the secondary structure so that the protein secondary structure can be obtained according to the amino acid sequence. Driven by deep learning, the prediction accuracy of the protein secondary structure has been greatly improved in recent years. To explore a new technique of PSSP, this study introduces the concept of an adversarial game into the prediction of the secondary structure, and a conditional generative adversarial network (GAN)-based prediction model is proposed. We introduce a new multiscale convolution module and an improved channel attention (ICA) module into the generator to generate the secondary structure, and then a discriminator is designed to conflict with the generator to learn the complicated features of proteins. Then, we propose a PSSP method based on the proposed multiscale convolution module and ICA module. The experimental results indicate that the conditional GAN-based protein secondary structure prediction (CGAN-PSSP) model is workable and worthy of further study because of the strong feature-learning ability of adversarial learning.

  • Research Article
  • Cite Count Icon 51
  • 10.1002/pro.5560050422
The importance of larger data sets for protein secondary structure prediction with neural networks.
  • Apr 1, 1996
  • Protein Science
  • John‐Marc Chandonia + 1 more

A neural network algorithm is applied to secondary structure and structural class prediction for a database of 318 nonhomologous protein chains. Significant improvement in accuracy is obtained as compared with performance on smaller databases. A systematic study of the effects of network topology shows that, for the larger database, better results are obtained with more units in the hidden layer. In a 32-fold cross validated test, secondary structure prediction accuracy is 67.0%, relative to 62.6% obtained previously, without any evolutionary information on the sequence. Introduction of sequence profiles increases this value to 72.9%, suggesting that the two types of information are essentially independent. Tertiary structural class is predicted with 80.2% accuracy, relative to 73.9% obtained previously. The use of a larger database is facilitated by the introduction of a scaled conjugate gradient algorithm for optimizing the neural network. This algorithm is about 10-20 times as fast as the standard steepest descent algorithm.

  • Research Article
  • Cite Count Icon 14
  • 10.1186/s43141-022-00404-6
Plant catalase in silico characterization and phylogenetic analysis with structural modeling
  • Aug 19, 2022
  • Journal of Genetic Engineering and Biotechnology
  • Takio Nene + 2 more

BackgroundCatalase (EC 1.11.1.6) is a heme-containing tetrameric enzyme that plays a critical role in signaling and hydrogen peroxide metabolism. It was the first enzyme to be crystallized and isolated. Catalase is a well-known industrial enzyme used in diagnostic and analytical methods in the form of biomarkers and biosensors, as well as in the textile, paper, food, and pharmaceutical industries. In silico analysis of CAT genes and proteins has gained increased interest, emphasizing the development of biomarkers and drug designs. The present work aims to understand the catalase evolutionary relationship of plant species and analyze its physicochemical characteristics, homology, phylogenetic tree construction, secondary structure prediction, and 3D modeling of protein sequences and its validation using a variety of conventional computational methods to assist researchers in better understanding the structure of proteins. ResultsAround 65 plant catalase sequences were computationally evaluated and subjected to bioinformatics assessment for physicochemical characterization, multiple sequence alignment, phylogenetic construction, motif and domain identification, and secondary and tertiary structure prediction. The phylogenetic tree revealed six unique clusters where diversity of plant catalases was found to be the largest for Oryza sativa. The thermostability and hydrophilic nature of these proteins were primarily observed, as evidenced by a relatively high aliphatic index and negative GRAVY value. The distribution of 5 sequence motifs was uniformly distributed with a width length of 50 with the best possible amino residue sequences that resemble the plant catalase PLN02609 superfamily. Using SOPMA, the predicted secondary structure of the protein sequences revealed the predominance of the random coil. The predicted 3D CAT model from Arabidopsis thaliana was a homotetramer, thermostable protein with 59-KDa weight, and its structural validation was confirmed by PROCHECK, ERRAT, Verify3D, and Ramachandran plot. The functional relationships of our query sequence revealed the glutathione reductase as the closest interacting protein of query protein. ConclusionsThis theoretical plant catalases in silico analysis provide insight into its physiochemical characteristics and functional and structural understanding and its evolutionary behavior and exploring protein structure-function relationships when crystal structures are unavailable.

  • Research Article
  • Cite Count Icon 37
  • 10.1186/1471-2105-12-154
Learning sparse models for a dynamic Bayesian network classifier of protein secondary structure
  • May 13, 2011
  • BMC Bioinformatics
  • Zafer Aydin + 3 more

BackgroundProtein secondary structure prediction provides insight into protein function and is a valuable preliminary step for predicting the 3D structure of a protein. Dynamic Bayesian networks (DBNs) and support vector machines (SVMs) have been shown to provide state-of-the-art performance in secondary structure prediction. As the size of the protein database grows, it becomes feasible to use a richer model in an effort to capture subtle correlations among the amino acids and the predicted labels. In this context, it is beneficial to derive sparse models that discourage over-fitting and provide biological insight.ResultsIn this paper, we first show that we are able to obtain accurate secondary structure predictions. Our per-residue accuracy on a well established and difficult benchmark (CB513) is 80.3%, which is comparable to the state-of-the-art evaluated on this dataset. We then introduce an algorithm for sparsifying the parameters of a DBN. Using this algorithm, we can automatically remove up to 70-95% of the parameters of a DBN while maintaining the same level of predictive accuracy on the SD576 set. At 90% sparsity, we are able to compute predictions three times faster than a fully dense model evaluated on the SD576 set. We also demonstrate, using simulated data, that the algorithm is able to recover true sparse structures with high accuracy, and using real data, that the sparse model identifies known correlation structure (local and non-local) related to different classes of secondary structure elements.ConclusionsWe present a secondary structure prediction method that employs dynamic Bayesian networks and support vector machines. We also introduce an algorithm for sparsifying the parameters of the dynamic Bayesian network. The sparsification approach yields a significant speed-up in generating predictions, and we demonstrate that the amino acid correlations identified by the algorithm correspond to several known features of protein secondary structure. Datasets and source code used in this study are available at http://noble.gs.washington.edu/proj/pssp.

More from: Proteins: Structure, Function, and Bioinformatics
  • Journal Issue
  • 10.1002/prot.v93.11
  • Nov 1, 2025
  • Proteins: Structure, Function, and Bioinformatics

  • Research Article
  • 10.1002/prot.70069
Cover Image, Volume 93, Issue 11
  • Oct 13, 2025
  • Proteins: Structure, Function, and Bioinformatics
  • Pulkit Kr Gupta + 2 more

  • Research Article
  • 10.1002/prot.26719
Issue Information ‐ Table of Content
  • Oct 13, 2025
  • Proteins: Structure, Function, and Bioinformatics

  • Journal Issue
  • 10.1002/prot.v93.9
  • Sep 1, 2025
  • Proteins: Structure, Function, and Bioinformatics

  • Research Article
  • 10.1002/prot.26717
Issue Information ‐ Table of Content
  • Aug 1, 2025
  • Proteins: Structure, Function, and Bioinformatics

  • Journal Issue
  • 10.1002/prot.v93.8
  • Aug 1, 2025
  • Proteins: Structure, Function, and Bioinformatics

  • Research Article
  • 10.1002/prot.70018
Biomolecular Interaction Prediction in the Pre‐ and Post‐AlphaFold Era: The 8th CAPRI Evaluation
  • Jul 14, 2025
  • Proteins: Structure, Function, and Bioinformatics
  • Marc F Lensink + 5 more

  • Research Article
  • 10.1002/prot.26716
Issue Information ‐ Table of Content
  • Jul 14, 2025
  • Proteins: Structure, Function, and Bioinformatics

  • Journal Issue
  • 10.1002/prot.v93.7
  • Jul 1, 2025
  • Proteins: Structure, Function, and Bioinformatics

  • Research Article
  • 10.1002/prot.26715
Issue Information ‐ Table of Content
  • Jun 1, 2025
  • Proteins: Structure, Function, and Bioinformatics

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.

Search IconWhat is the difference between bacteria and viruses?
Open In New Tab Icon
Search IconWhat is the function of the immune system?
Open In New Tab Icon
Search IconCan diabetes be passed down from one generation to the next?
Open In New Tab Icon