The Maximum Entropy Fallacy Redux?

Erik Aurell

doi:10.1371/journal.pcbi.1004777

Erik Aurell

Open Access

https://doi.org/10.1371/journal.pcbi.1004777

Copy DOI

Abstract

Protein structure can be predicted in silico given sufficiently good templates, as demonstrated in successive installments of the biannual Critical Assessment of protein Structure Prediction (CASP) competition [1]. Since the number of known protein sequences currently increases much faster than the number of known protein structures, and is likely to continue to do so in the foreseeable future, reliable ab initio protein structure prediction, without recourse to templates, would be highly desirable. It is currently not possible to achieve sufficient sampling in unrestrained folding to achieve predictions close to the native structure. Recently, very important progress has been made on the restricted problem of predicting spatial amino acid contacts in proteins from many homologous sequences [2]. While it is not yet clear if these techniques, collectively known as direct coupling analysis (DCA), can be leveraged to systematically predict full protein structures, preliminary results indicate that this may be the case [3,4]. DCA has also been used with success for several related problems, such as predicting structures of protein complexes [4] or alternative protein conformations [5]. The central ingredient in DCA is to learn generative probabilistic models from a set of homologous protein sequences. These models are chosen from an exponential family with linear and quadratic interactions, commonly referred to as Potts models (see Eq 1) [6]. In the literature, this procedure has been motivated by maximum entropy arguments [7–10]. In this Perspective, I will point out that these arguments are mistaken and that the successes of DCA can have nothing to do with maximum entropy. To the contrary, maximum entropy hides the real nature and questions raised by DCA and is thus an obstacle to progress. In addition, maximum entropy has a long and contested history in statistical physics, the field in which it was first introduced [11,12]. Definite and precise results derived in the last decade and a half have here conclusively falsified maximum entropy. Appeals to maximum entropy are therefore prejudicial to a more general acceptance and adoption of DCA.

Highlights

Eq 1 is not the most unbiased representation of our knowledge of the system. It is a representation of the subset of our knowledge about the system, which remains after the data have first been compressed from the whole multiple sequence alignment to fi(k) and fij(k,l)
The other class of approximate inference methods widely used in direct coupling analysis (DCA), known as pseudolikelihood and which do lead to consistent estimators, instead keep all the data and never compress to fi(k) and fij(k,l)
The conceptual appeal of the maximum entropy argument is that it immediately leads to the Boltzmann distribution of equilibrium statistical physics

Summary

The Maximum Entropy Fallacy Redux?

Protein structure can be predicted in silico given sufficiently good templates, as demonstrated in successive installments of the biannual Critical Assessment of protein Structure Prediction (CASP) competition [1]. Very important progress has been made on the restricted problem of predicting spatial amino acid contacts in proteins from many homologous sequences [2]. While it is not yet clear if these techniques, collectively known as direct coupling analysis (DCA), can be leveraged to systematically predict full protein structures, preliminary results indicate that this may be the case [3,4]. The central ingredient in DCA is to learn generative probabilistic models from a set of homologous protein sequences These models are chosen from an exponential family with linear and quadratic interactions, commonly referred to as Potts models (see Eq 1) [6]. Appeals to maximum entropy are prejudicial to a more general acceptance and adoption of DCA

Maximum Entropy and DCA

The Elementary Counterargument

Maximum Entropy in Statistical Physics

The Problem

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLOS Computational Biology	Publication Date: May 12, 2016
Citations: 18	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

The Maximum Entropy Fallacy Redux?

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS Computational Biology

Lead the way for us

Similar Papers

ProteinNet: a standardized data set for machine learning of protein structure
Mohammed Alquraishi
BMC Bioinformatics | VOL. 20
Mohammed AlquraishiMohammed Alquraishi
11 Jun 2019
BMC Bioinformatics | VOL. 20

Pretty good guessing: protein structure prediction at CASP5.
Rosemarie Swanson ... Jerry Tsai
Journal of Bacteriology | VOL. 185
Rosemarie Swanson, et. al.Rosemarie Swanson ... Jerry Tsai
15 Jul 2003
Journal of Bacteriology | VOL. 185

Unveiling the evolution of policies for enhancing protein structure predictions: A comprehensive analysis
Faezeh Rahimzadeh ... Shahin Pourbahrami
Computers in Biology and Medicine | VOL. 179
Faezeh Rahimzadeh, et. al.Faezeh Rahimzadeh ... Shahin Pourbahrami
11 Jul 2024
Computers in Biology and Medicine | VOL. 179

E-infrastructure technologies triggering of Bioinformatics development
Irena Roterman
Bioinformation | VOL. 2
Irena RotermanIrena Roterman
05 Dec 2007
Bioinformation | VOL. 2

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The Maximum Entropy Fallacy Redux?

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS Computational Biology