Correlation-compressed direct-coupling analysis

Chen-Yi Gao,Erik Aurell,Hai-Jun Zhou

doi:10.1103/physreve.98.032407

Abstract

Learning Ising or Potts models from data has become an important topic in statistical physics and computational biology, with applications to predictions of structural contacts in proteins and other areas of biological data analysis. The corresponding inference problems are challenging since the normalization constant (partition function) of the Ising/Potts distributions cannot be computed efficiently on large instances. Different ways to address this issue have hence given size to a substantial methodological literature. In this paper we investigate how these methods could be used on much larger datasets than studied previously. We focus on a central aspect, that in practice these inference problems are almost always severely under-sampled, and the operational result is almost always a small set of leading (largest) predictions. We therefore explore an approach where the data is pre-filtered based on empirical correlations, which can be computed directly even for very large problems. Inference is only used on the much smaller instance in a subsequent step of the analysis. We show that in several relevant model classes such a combined approach gives results of almost the same quality as the computationally much more demanding inference on the whole dataset. We also show that results on whole-genome epistatic couplings that were obtained in a recent computation-intensive study can be retrieved by the new approach. The method of this paper hence opens up the possibility to learn parameters describing pair-wise dependencies in whole genomes in a computationally feasible and expedient manner.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Correlation-compressed direct-coupling analysis

Abstract

Talk to us

Similar Papers

More From: Physical Review E

Lead the way for us

Journal: Physical Review E	Publication Date: Sep 11, 2018
Citations: 12

Similar Papers

A comprehensive assessment of sequence-based and template-based methods for protein contact prediction
Sitao Wu ... Yang Zhang
Bioinformatics | VOL. 24
Sitao Wu, et. al.Sitao Wu ... Yang Zhang
22 Feb 2008
Bioinformatics | VOL. 24

Nested sampling, statistical physics and the Potts model
Manuel J Pfeifenberger ... Wolfgang Von Der Linden
Journal of Computational Physics | VOL. 375
Manuel J Pfeifenberger, et. al.Manuel J Pfeifenberger ... Wolfgang Von Der Linden
29 Aug 2018
Journal of Computational Physics | VOL. 375

What's behind bioinformatics?
Lorraine K Tanabe
Trends in Biotechnology | VOL. 19
Lorraine K TanabeLorraine K Tanabe
26 Jan 2001
Trends in Biotechnology | VOL. 19

The Maximum Entropy Fallacy Redux?
Erik Aurell
PLOS Computational Biology | VOL. 12
Erik AurellErik Aurell
12 May 2016
PLOS Computational Biology | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Correlation-compressed direct-coupling analysis

Abstract

Talk to us

Similar Papers

More From: Physical Review E