Prediction of secondary structural content of proteins from their amino acid composition alone. II. The paradox with secondary structural class.

Frank Eisenhaber,Patrick Argos,Cornelius Frömmel

doi:10.1002/(sici)1097-0134(199606)25:2<169::aid-prot3>3.0.co;2-d

Abstract

The success rates reported for secondary structural class prediction with different methods are contradictory. On one side, the problem of recognizing the secondary structural class of a protein knowing only its amino acid composition appears completely solved by simply applying jury decision with an elliptically scaled distance function. Chou and coworkers repeatedly (see Crit. Rev. Biochem. Mol. Biol. 30:275-349, 1995) published prediction accuracies near 100%. On the other hand, traditional secondary structure prediction techniques achieve success rates of about 70% for the secondary structural state per residue and about 75% for structural class only with extensive input information (full sequence of the query protein, its amino acid composition and length, multiple alignments with homologous sequences). In this article, we resolve the paradox and consider (1) the question of the secondary structural class definition, (2) the role of the representativity of the test set of protein tertiary structure for the current state of the Protein Data Bank (PDB); and (3) we estimate the real impact of amino acid composition on secondary structural class. We formulate three objective criteria for a reasonable definition of secondary structural classes and show that only the criterion of Nakashima et al. (J. Biochem. 99:153-162, 1986) complies with all of them. Only this definition matches the distribution of secondary structural content in representative PDB subsets, whereas other criteria leave many proteins (up to 65% of all PDB entries) simply unassigned. We review critically specialized secondary-structural class prediction methods, especially those of Chou and coworkers, which claim almost 100% accuracy using only amino acid composition, and resolve the paradox that these prediction accuracies are better than those from secondary structure predictions from multiple alignments. We show (i) that these techniques rely on a preselection of test sets which removes irregular proteins and other proteins without any class assignment (about 35% of all PDB entries); and (ii) that even for preselected representative test sets, the success rate drops to 60% and lower for a 4-type classification (alpha, beta, alpha + beta, alpha/beta). The prediction accuracies fall to about 50% if the secondary structural class definition of Nakashima et al. is applied and only few irregular proteins are preselected and removed from automatically generated, representative subsets of the PDB. We have applied two new vector decomposition methods for secondary structural content prediction from amino acid composition alone, with and without consideration of amino acid compositional coupling in the learning set of tertiary structures respectively, to the problem of class prediction and achieve about 60% correct assignment among four classes (alpha, beta, mixed, irregular) as well as single sequence-based secondary structure prediction methods like GORIII and COMBI. Our results demonstrate that 60% correctness is the upper limit for a 4-type class prediction from amino acid composition alone for an unknown query protein and that consideration of compositional coupling does not improve the prediction success. The prediction program SSCP offering secondary structural class assignment for query compositions and sequences has been made available as a World Wide Web and E-mail service.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Prediction of secondary structural content of proteins from their amino acid composition alone. II. The paradox with secondary structural class.

Abstract

Talk to us

Similar Papers

More From: Proteins

Lead the way for us

Similar Papers

Prediction of secondary structural content of proteins from their amino acid composition alone. I. New analytic vector decomposition methods.
Frank Eisenhaber ... Patrick Argos
Proteins | VOL. 25
Frank Eisenhaber, et. al.Frank Eisenhaber ... Patrick Argos
01 Jun 1996
Proteins | VOL. 25

Protein secondary structure prediction based on the GOR algorithm incorporating multiple sequence alignment information
A Kloczkowski ... J Garnier
Polymer | VOL. 43
A Kloczkowski, et. al.A Kloczkowski ... J Garnier
25 Oct 2001
Polymer | VOL. 43

Data Mining for Protein Secondary Structure Prediction
Haitao Cheng ... Taner Z Sen
-
Haitao Cheng, et. al.Haitao Cheng ... Taner Z Sen
01 Jan 2009
01 Jan 2009

An algorithm for protein secondary structure prediction based on class prediction.
G Deléage ... B Roux
"Protein Engineering, Design and Selection" | VOL. 1
G Deléage, et. al.G Deléage ... B Roux
01 Jan 1987
"Protein Engineering, Design and Selection" | VOL. 1

Journal: Proteins	Publication Date: Jun 1, 1996
Citations: 82

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Prediction of secondary structural content of proteins from their amino acid composition alone. II. The paradox with secondary structural class.

Abstract

Talk to us

Similar Papers

More From: Proteins