The intrinsic dimension of protein sequence evolution.

Elena Facco,Elena Tea Russo,Andrea Pagnani,Alessandro Laio

doi:10.1371/journal.pcbi.1006767

Abstract

It is well known that, in order to preserve its structure and function, a protein cannot change its sequence at random, but only by mutations occurring preferentially at specific locations. We here investigate quantitatively the amount of variability that is allowed in protein sequence evolution, by computing the intrinsic dimension (ID) of the sequences belonging to a selection of protein families. The ID is a measure of the number of independent directions that evolution can take starting from a given sequence. We find that the ID is practically constant for sequences belonging to the same family, and moreover it is very similar in different families, with values ranging between 6 and 12. These values are significantly smaller than the raw number of amino acids, confirming the importance of correlations between mutations in different sites. However, we demonstrate that correlations are not sufficient to explain the small value of the ID we observe in protein families. Indeed, we show that the ID of a set of protein sequences generated by maximum entropy models, an approach in which correlations are accounted for, is typically significantly larger than the value observed in natural protein families. We further prove that a critical factor to reproduce the natural ID is to take into consideration the phylogeny of sequences.

Highlights

Protein sequence evolution is an extremely important process in living organisms
The families are extracted from clans that are very different from each other: clan CL0489 for instance includes antifreeze proteins, while clan CL0378 consists of enzymes including luciferase
If we look at the sequence similarity within a family, we find entries sharing only 20% of the amino acids so that the number of mutations observed in a family is enormous

Summary

Introduction

Protein sequence evolution is an extremely important process in living organisms. During evolution, due to insertions, deletions, substitutions, a sequence can significantly change. Despite the fact that the sequence similarity between members of the same family can be extremely low, by looking at the multiple sequence alignment (MSA) of a protein family one immediately notices patterns. Amino acids in specific columns of the MSA are often conserved, and mutations in different columns are in many cases correlated. This observation is at the very basis of statistical models for assessing the probability that a protein sequence belongs to a family [1] or for predicting the three-dimensional structure of the protein from the MSA [2, 3]. Frequent occurrences of the same amino acid in a column of the MSA together with covariation between different columns suggest that evolution modifies the sequences along a number of directions that is much lower than the bare dimension of the space sampled by randomly substituting amino acids

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLoS computational biology	Publication Date: Apr 8, 2019
Citations: 19	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

The intrinsic dimension of protein sequence evolution.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS computational biology

Lead the way for us

Similar Papers

Learning to Read and Write in the Language of Proteins
Helen T Hobbs ... Chang C Liu
GEN Biotechnology | VOL. 2
Helen T Hobbs, et. al.Helen T Hobbs ... Chang C Liu
01 Apr 2023
GEN Biotechnology | VOL. 2

Evolution of protein sequences and structures
Todd C Wood ... William R Pearson
Journal of Molecular Biology | VOL. 291
Todd C Wood, et. al.Todd C Wood ... William R Pearson
01 Aug 1999
Journal of Molecular Biology | VOL. 291

PANTHER : Protein families and subfamilies modeled on the divergence of function
Paul D Thomas
-
Paul D ThomasPaul D Thomas
26 Sep 2005
26 Sep 2005

Indel-Seq-Gen: A New Protein Family Simulator Incorporating Domains, Motifs, and Indels
C L Strope ... S D Scott
Molecular Biology and Evolution | VOL. 24
C L Strope, et. al.C L Strope ... S D Scott
05 Dec 2006
Molecular Biology and Evolution | VOL. 24

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The intrinsic dimension of protein sequence evolution.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS computational biology