Abstract

Protein structure can be predicted in silico given sufficiently good templates, as demonstrated in successive installments of the biannual Critical Assessment of protein Structure Prediction (CASP) competition [1]. Since the number of known protein sequences currently increases much faster than the number of known protein structures, and is likely to continue to do so in the foreseeable future, reliable ab initio protein structure prediction, without recourse to templates, would be highly desirable. It is currently not possible to achieve sufficient sampling in unrestrained folding to achieve predictions close to the native structure. Recently, very important progress has been made on the restricted problem of predicting spatial amino acid contacts in proteins from many homologous sequences [2]. While it is not yet clear if these techniques, collectively known as direct coupling analysis (DCA), can be leveraged to systematically predict full protein structures, preliminary results indicate that this may be the case [3,4]. DCA has also been used with success for several related problems, such as predicting structures of protein complexes [4] or alternative protein conformations [5]. The central ingredient in DCA is to learn generative probabilistic models from a set of homologous protein sequences. These models are chosen from an exponential family with linear and quadratic interactions, commonly referred to as Potts models (see Eq 1) [6]. In the literature, this procedure has been motivated by maximum entropy arguments [7–10]. In this Perspective, I will point out that these arguments are mistaken and that the successes of DCA can have nothing to do with maximum entropy. To the contrary, maximum entropy hides the real nature and questions raised by DCA and is thus an obstacle to progress. In addition, maximum entropy has a long and contested history in statistical physics, the field in which it was first introduced [11,12]. Definite and precise results derived in the last decade and a half have here conclusively falsified maximum entropy. Appeals to maximum entropy are therefore prejudicial to a more general acceptance and adoption of DCA.

Highlights

  • Eq 1 is not the most unbiased representation of our knowledge of the system. It is a representation of the subset of our knowledge about the system, which remains after the data have first been compressed from the whole multiple sequence alignment to fi(k) and fij(k,l)

  • The other class of approximate inference methods widely used in direct coupling analysis (DCA), known as pseudolikelihood and which do lead to consistent estimators, instead keep all the data and never compress to fi(k) and fij(k,l)

  • The conceptual appeal of the maximum entropy argument is that it immediately leads to the Boltzmann distribution of equilibrium statistical physics

Read more

Summary

The Maximum Entropy Fallacy Redux?

Protein structure can be predicted in silico given sufficiently good templates, as demonstrated in successive installments of the biannual Critical Assessment of protein Structure Prediction (CASP) competition [1]. Very important progress has been made on the restricted problem of predicting spatial amino acid contacts in proteins from many homologous sequences [2]. While it is not yet clear if these techniques, collectively known as direct coupling analysis (DCA), can be leveraged to systematically predict full protein structures, preliminary results indicate that this may be the case [3,4]. The central ingredient in DCA is to learn generative probabilistic models from a set of homologous protein sequences These models are chosen from an exponential family with linear and quadratic interactions, commonly referred to as Potts models (see Eq 1) [6]. Appeals to maximum entropy are prejudicial to a more general acceptance and adoption of DCA

Maximum Entropy and DCA
The Elementary Counterargument
Maximum Entropy in Statistical Physics
The Problem
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call