Toward Inferring Potts Models for Phylogenetically Correlated Sequence Data

Edwin Rodriguez Horta,Pierre Barrat-Charlaix,Martin Weigt

doi:10.3390/e21111090

Abstract

Global coevolutionary models of protein families have become increasingly popular due to their capacity to predict residue–residue contacts from sequence information, but also to predict fitness effects of amino acid substitutions or to infer protein–protein interactions. The central idea in these models is to construct a probability distribution, a Potts model, that reproduces single and pairwise frequencies of amino acids found in natural sequences of the protein family. This approach treats sequences from the family as independent samples, completely ignoring phylogenetic relations between them. This simplification is known to lead to potentially biased estimates of the parameters of the model, decreasing their biological relevance. Current workarounds for this problem, such as reweighting sequences, are poorly understood and not principled. Here, we propose an inference scheme that takes the phylogeny of a protein family into account in order to correct biases in estimating the frequencies of amino acids. Using artificial data, we show that a Potts model inferred using these corrected frequencies performs better in predicting contacts and fitness effect of mutations. First, only partially successful tests on real protein data are presented, too.

Highlights

Based on the rapidly growing availability of biological sequence data [1,2,3], statistical models of sequences have gained considerable interest over the last years [4,5,6,7]
The direct coupling analysis (DCA) [8] takes inspiration from inverse statistical physics [9]: it aims at describing the sequence variability of sets of evolutionarily related protein sequences—so-called homologous protein families—via Potts models
Strong couplings between two sites in the Potts model are a good indication of the corresponding amino acids being in contact in the protein fold

Summary

Introduction

Based on the rapidly growing availability of biological sequence data [1,2,3], statistical models of sequences have gained considerable interest over the last years [4,5,6,7] In this context, the direct coupling analysis (DCA) [8] takes inspiration from inverse statistical physics [9]: it aims at describing the sequence variability of sets of evolutionarily related protein sequences—so-called homologous protein families—via Potts models. The direct coupling analysis (DCA) [8] takes inspiration from inverse statistical physics [9]: it aims at describing the sequence variability of sets of evolutionarily related protein sequences—so-called homologous protein families—via Potts models Such a model gives a probability P( A) = exp Z

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Entropy	Publication Date: Nov 7, 2019
Citations: 24	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Toward Inferring Potts Models for Phylogenetically Correlated Sequence Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Entropy

Lead the way for us

Similar Papers

Potts Hamiltonian Models and Molecular Dynamics Free Energy Simulations for Predicting the Impact of Mutations on Protein Kinase Stability.
Abhishek Thakur ... Joan Gizzio
The Journal of Physical Chemistry B | VOL. 128
Abhishek Thakur, et. al.Abhishek Thakur ... Joan Gizzio
13 Feb 2024
The Journal of Physical Chemistry B | VOL. 128

Limits to detecting epistasis in the fitness landscape of HIV
Allan Haldane ... Ronald M Levy
-
Allan Haldane, et. al.Allan Haldane ... Ronald M Levy
18 Jan 2022
18 Jan 2022

Limits to detecting epistasis in the fitness landscape of HIV.
Avik Biswas ... Allan Haldane
PloS one | VOL. 17
Avik Biswas, et. al.Avik Biswas ... Allan Haldane
18 Jan 2022
PloS one | VOL. 17

Statistical approach for lysosomal membrane proteins (LMPs) identification.
Vijay Tripathi ... Dwijendra Gupta
Systems and Synthetic Biology | VOL. 8
Vijay Tripathi, et. al.Vijay Tripathi ... Dwijendra Gupta
02 Aug 2014
Systems and Synthetic Biology | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Toward Inferring Potts Models for Phylogenetically Correlated Sequence Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Entropy