Sequence statistics of tertiary structural motifs reflect protein stability.

Fan Zheng,Gevorg Grigoryan

doi:10.1371/journal.pone.0178272

Abstract

The Protein Data Bank (PDB) has been a key resource for learning general rules of sequence-structure relationships in proteins. Quantitative insights have been gained by defining geometric descriptors of structure (e.g., distances, dihedral angles, solvent exposure, etc.) and observing their distributions and sequence preferences. Here we argue that as the PDB continues to grow, it may become unnecessary to reduce structure into a set of elementary descriptors. Instead, it could be possible to deduce quantitative sequence-structure relationships in the context of precisely-defined complex structural motifs by mining the PDB for closely matching backbone geometries. To validate this idea, we turned to the the task of predicting changes in protein stability upon amino-acid substitution—a difficult problem of broad significance. We defined non-contiguous tertiary motifs (TERMs) around a protein site of interest and extracted sequence preferences from ensembles of closely-matching substructures in the PDB to predict mutational stability changes at the site, ΔΔGm. We demonstrate that these ensemble statistics predict ΔΔGm on par with state-of-the-art statistical and machine-learning methods on large thermodynamic datasets, and outperform these, along with a leading structure-based modeling approach, when tested in the context of unbiased diverse mutations. Further, we show that the performance of the TERM-based method is directly related to the amount of available relevant structural data, automatically improving with the growing PDB. This enables a means of estimating prediction accuracy. Our results clearly demonstrate that: 1) statistics of non-contiguous structural motifs in the PDB encode fundamental sequence-structure relationships related to protein thermodynamic stability, and 2) the PDB is now large enough that such statistics are already useful in practice, with their accuracy expected to continue increasing as the database grows. These observations suggest new ways of using structural data towards addressing problems of computational structural biology.

Highlights

Quantifying sequence-structure relationships in proteins has been a long-standing fundamental challenge in computational structural biology
Our results show that tertiary structural motifs and their Protein Data Bank (PDB)-based sequence statistics encode sequence-structure relationships reflective of fundamental thermodynamics of structure
To test how sensitive this result was to the choice of λ, we repeated all calculations with several other values, confirming that the initial intuitive choice was in the right range and that, in general, differences were small until λ deviated considerably from the 200-1,000 range

Summary

Introduction

Quantifying sequence-structure relationships in proteins has been a long-standing fundamental challenge in computational structural biology. Knowledge-based potentials have contributed significantly towards progress in such grand challenges as structure prediction, and a broad variety of structural features have been exploited towards attaining better statistical potentials—e.g., backbone dihedral angles, atomic distances and densities, bond orientations, residue burial states, and inter-residue contacts, to name a few [9,10,11,12,13,14] Statistics of these structural features, and their associated sequence propensities, have been combined into scoring functions in a variety of ways [9, 15,16,17,18,19]

Objectives

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLOS ONE	Publication Date: May 26, 2017
Citations: 19	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Sequence statistics of tertiary structural motifs reflect protein stability.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE

Lead the way for us

Similar Papers

Sequence feature-based prediction of protein stability changes upon amino acid substitutions.
Shaolei Teng ... Anand K Srivastava
BMC Genomics | VOL. Suppl 11 2
Shaolei Teng, et. al.Shaolei Teng ... Anand K Srivastava
01 Jan 2009
BMC Genomics | VOL. Suppl 11 2

Author response: Rapid protein stability prediction using deep learning representations
Lasse M Blaabjerg ... Lydia L Good
-
Lasse M Blaabjerg, et. al.Lasse M Blaabjerg ... Lydia L Good
09 May 2023
09 May 2023

Structure of the Pandemic
Sarah Kearns
Structure | VOL. 28
Sarah KearnsSarah Kearns
01 Aug 2020
Structure | VOL. 28

Rapid search for tertiary fragments reveals protein sequence-structure relationships.
Jianfu Zhou ... Gevorg Grigoryan
Protein Science | VOL. 24
Jianfu Zhou, et. al.Jianfu Zhou ... Gevorg Grigoryan
31 Dec 2015
Protein Science | VOL. 24

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Sequence statistics of tertiary structural motifs reflect protein stability.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE