Prediction of secondary structural content of proteins from their amino acid composition alone. I. New analytic vector decomposition methods.

Frank Eisenhaber,Federica Imperiale,Cornelius Frömmel,Patrick Argos

doi:10.1002/(sici)1097-0134(199606)25:2<157::aid-prot2>3.0.co;2-f

Abstract

The predictive limits of the amino acid composition for the secondary structural content (percentage of residues in the secondary structural states helix, sheet, and coil) in proteins are assessed quantitatively. For the first time, techniques for prediction of secondary structural content are presented which rely on the amino acid composition as the only information on the query protein. In our first method, the amino acid composition of an unknown protein is represented by the best (in a least square sense) linear combination of the characteristic amino acid compositions of the three secondary structural types computed from a learning set of tertiary structures. The second technique is a generalization of the first one and takes into account also possible compositional couplings between any two sorts of amino acids. Its mathematical formulation results in an eigenvalue/eigenvector problem of the second moment matrix describing the amino acid compositional fluctuations of secondary structural types in various proteins of a learning set. Possible correlations of the principal directions of the eigenspaces with physical properties of the amino acids were also checked. For example, the first two eigenvectors of the helical eigenspace correlate with the size and hydrophobicity of the residue types respectively. As learning and test sets of tertiary structures, we utilized representative, automatically generated subsets of Protein Data Bank (PDB) consisting of non-homologous protein structures at the resolution thresholds < or = 1.8A, < or = 2.0A, < or = 2.5A, and < or = 3.0 A. We show that the consideration of compositional couplings improves prediction accuracy, albeit not dramatically. Whereas in the self-consistency test (learning with the protein to be predicted), a clear decrease of prediction accuracy with worsening resolution is observed, the jackknife test (leave the predicted protein out) yielded best results for the largest dataset (< or = 3.0A, almost no difference to the self-consistency test!), i.e., only this set, with more than 400 proteins, is sufficient for stable computation of the parameters in the prediction function of the second method. The average absolute error in predicting the fraction of helix, sheet, and coil from amino acid composition of the query protein are 13.7, 12.6, and 11.4%, respectively with r.m.s. deviations in the range of 8.6 divided by 11.8% for the 3.0 A dataset in a jackknife test. The absolute precision of the average absolute errors is in the range of 1 divided by 3% as measured for other representative subsets of the PDB. Secondary structural content prediction methods found in the literature have been clustered in accordance with their prediction accuracies. To our surprise, much more complex secondary structure prediction methods utilized for the same purpose of secondary structural content prediction achieve prediction accuracies very similar to those of the present analytic techniques, implying that all the information beyond the amino acid composition is, in fact, mainly utilized for positioning the secondary structural state in the sequence but not for determination of the overall number of residues in a secondary structural type. This result implies that higher prediction accuracies cannot be achieved relying solely on the amino acid composition of an unknown query protein as prediction input. Our prediction program SSCP has been made available as a World Wide Web and E-mail service.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Prediction of secondary structural content of proteins from their amino acid composition alone. I. New analytic vector decomposition methods.

Abstract

Talk to us

Similar Papers

More From: Proteins

Lead the way for us

Similar Papers

Prediction of secondary structural content of proteins from their amino acid composition alone. II. The paradox with secondary structural class.
Frank Eisenhaber ... Cornelius Frömmel
Proteins | VOL. 25
Frank Eisenhaber, et. al.Frank Eisenhaber ... Cornelius Frömmel
01 Jun 1996
Proteins | VOL. 25

Relations between size, secondary structure content and amino acid composition in globular proteins
G.A Sagnella
International Journal of Biochemistry | VOL. 14
G.A SagnellaG.A Sagnella
01 Jan 1981
International Journal of Biochemistry | VOL. 14

Simultaneous prediction of protein secondary structure and transmembrane spans
Julia Koehler Leman, ... Jens Meiler
Proteins: Structure, Function, and Bioinformatics | VOL. 81
Julia Koehler Leman,, et. al.Julia Koehler Leman, ... Jens Meiler
10 Apr 2013
Proteins: Structure, Function, and Bioinformatics | VOL. 81

Peptides secondary structure prediction with neural networks: a criterion for building appropriate learning sets.
C Ruggiero ... R Sacile
IEEE transactions on bio-medical engineering | VOL. 40
C Ruggiero, et. al.C Ruggiero ... R Sacile
01 Jan 1992
IEEE transactions on bio-medical engineering | VOL. 40

Journal: Proteins	Publication Date: Jun 1, 1996
Citations: 95

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Prediction of secondary structural content of proteins from their amino acid composition alone. I. New analytic vector decomposition methods.

Abstract

Talk to us

Similar Papers

More From: Proteins