Abstract

The topology of most experimentally determined protein domains is defined by the relative arrangement of secondary structure elements, i.e. α-helices and β-strands, which make up 50–70% of the sequence. Pairing of β-strands defines the topology of β-sheets. The packing of side chains between α-helices and β-sheets defines the majority of the protein core. Often, limited experimental datasets restrain the position of secondary structure elements while lacking detail with respect to loop or side chain conformation. At the same time the regular structure and reduced flexibility of secondary structure elements make these interactions more predictable when compared to flexible loops and side chains. To determine the topology of the protein in such settings, we introduce a tailored knowledge-based energy function that evaluates arrangement of secondary structure elements only. Based on the amino acid Cβ atom coordinates within secondary structure elements, potentials for amino acid pair distance, amino acid environment, secondary structure element packing, β-strand pairing, loop length, radius of gyration, contact order and secondary structure prediction agreement are defined. Separate penalty functions exclude conformations with clashes between amino acids or secondary structure elements and loops that cannot be closed. Each individual term discriminates for native-like protein structures. The composite potential significantly enriches for native-like models in three different databases of 10,000–12,000 protein models in 80–94% of the cases. The corresponding application, “BCL::ScoreProtein,” is available at www.meilerlab.org.

Highlights

  • Many protein structures have been determined using experimental techniques such as X-ray crystallography and Nuclear Magnetic Resonance (NMR) spectroscopy

  • Of the approximately 69,000 protein structures deposited in the Protein Data Bank (PDB) as of August 2011, X-ray crystallography [1] contributed 88% and nuclear magnetic resonance (NMR) [2] contributed almost all of the remaining 12% [3]

  • We expect this scoring to be beneficial for certain settings in de novo protein structure determination: (1) When folding large proteins with complex topology, where simultaneous sampling of secondary structure elements (SSEs) arrangements and loop conformations would create a size limit for de novo protein structure determination

Read more

Summary

Introduction

Many protein structures have been determined using experimental techniques such as X-ray crystallography and Nuclear Magnetic Resonance (NMR) spectroscopy. The biological importance of these proteins justifies large efforts to collect limited experimental datasets that describe their structure Often these data restrain the topology of the protein, i.e. the relative placement of secondary structure elements (SSEs). To facilitate construction and evaluation of protein structural models from such limited datasets a tailored energy function that only evaluates the relative positioning of SSEs in topologies would be of great value. This energy function should predict the free energy of all states an amino acid sequence can access, and the lowest free energy should be associated with the native structure [14]. It has been shown that these potentials do not always distinguish native-like from incorrect structures [17]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call