How strongly do sequence conservation patterns and empirical scales correlate with exposure patterns of transmembrane helices of membrane proteins?

Yungki Park,Volkhard Helms

doi:10.1002/bip.20569

Abstract

Given the difficulty in determining high-resolution structures of helical membrane proteins, sequence-based prediction methods can be useful in elucidating diverse physiological processes mediated by this important class of proteins. Predicting the angular orientations of transmembrane (TM) helices about the helix axes, based on the helix parameters from electron microscopy data, is a classical problem in this regard. This problem has triggered the development of a number of different empirical scales. Recently, sequence conservation patterns were also made use of for improved predictions. Empirical scales and sequence conservation patterns (collectively termed as "prediction scales") have also found frequent applications in other research areas of membrane proteins: for example, in structure modeling and in prediction of buried TM helices. This trend is expected to grow in the near future unless there are revolutionary developments in the experimental characterization of membrane proteins. Thus, it is timely and imperative to carry out a comprehensive benchmark test over the prediction scales proposed so far to determine their pros and cons. In the current analysis, we use exposure patterns of TM helices as a golden standard, because if one develops a prediction scale that correlates perfectly with exposure patterns of TM helices, it will enable one to predict buried residues (or buried faces) of TM helices with an accuracy of 100%. Our analysis reveals several important points. (1) It demonstrates that sequence conservation patterns are much more strongly correlated with exposure patterns of TM helices than empirical scales. (2) Scales that were specifically parameterized using structure data (structure-based scales) display stronger correlation than hydrophobicity-based scales, as expected. (3) A nonnegligible difference is observed among the structure-based scales in their correlational property, suggesting that not every learning algorithm is equally effective. (4) A straightforward framework of optimally combining sequence conservation patterns and empirical scales is proposed, which reveals that improvements gained from combining the two sources of information are not dramatic in almost all cases. In turn, this calls for the development of fundamentally different scales that capture the essentials of membrane protein folding for substantial improvements.

Full Text