Undersampling and the Inference of Coevolution in Proteins

Yaakov Kleeorin,William P. Russ,Rama Ranganathan,Olivier Rivoire

doi:10.2139/ssrn.3821957

Abstract

Protein structure, function, and evolution depend on local and collective epistatic interactions between amino acids. A powerful approach to defining these interactions is to construct models of couplings between amino acids that reproduce the empirical statistics (frequencies and correlations) observed in sequences comprising a protein family. The top couplings are then interpreted. Here, we show that as currently implemented, this inference is always biased, a problem that fundamentally arises from the distinct scales at which epistasis occurs in proteins in the context of limited sampling. We show that these issues explain the ability of current approaches to predict tertiary contacts between amino acids and the inability to obviously expose larger networks of functionally-relevant, collectively evolving residues called sectors. This work provides a necessary foundation for more deeply understanding and improving evolution-based models of proteins.

Full Text