Protein structure can be predicted in silico given sufficiently good templates, as demonstrated in successive installments of the biannual Critical Assessment of protein Structure Prediction (CASP) competition [1]. Since the number of known protein sequences currently increases much faster than the number of known protein structures, and is likely to continue to do so in the foreseeable future, reliable ab initio protein structure prediction, without recourse to templates, would be highly desirable. It is currently not possible to achieve sufficient sampling in unrestrained folding to achieve predictions close to the native structure. Recently, very important progress has been made on the restricted problem of predicting spatial amino acid contacts in proteins from many homologous sequences [2]. While it is not yet clear if these techniques, collectively known as direct coupling analysis (DCA), can be leveraged to systematically predict full protein structures, preliminary results indicate that this may be the case [3,4]. DCA has also been used with success for several related problems, such as predicting structures of protein complexes [4] or alternative protein conformations [5]. The central ingredient in DCA is to learn generative probabilistic models from a set of homologous protein sequences. These models are chosen from an exponential family with linear and quadratic interactions, commonly referred to as Potts models (see Eq 1) [6]. In the literature, this procedure has been motivated by maximum entropy arguments [7–10]. In this Perspective, I will point out that these arguments are mistaken and that the successes of DCA can have nothing to do with maximum entropy. To the contrary, maximum entropy hides the real nature and questions raised by DCA and is thus an obstacle to progress. In addition, maximum entropy has a long and contested history in statistical physics, the field in which it was first introduced [11,12]. Definite and precise results derived in the last decade and a half have here conclusively falsified maximum entropy. Appeals to maximum entropy are therefore prejudicial to a more general acceptance and adoption of DCA.
Read full abstract