Percolation in protein sequence space.

Patrick C F Buchholz,Silvia Fademrecht,Jürgen Pleiss

doi:10.1371/journal.pone.0189646

Abstract

The currently known protein sequences are not distributed equally in sequence space, but cluster into families. Analyzing the cluster size distribution gives a glimpse of the large and unknown extant protein sequence space, which has been explored during evolution. For six protein superfamilies with different fold and function, the cluster size distributions followed a power law with slopes between 2.4 and 3.3, which represent upper limits to the cluster distribution of extant sequences. The power law distribution of cluster sizes is in accordance with percolation theory and strongly supports connectedness of extant sequence space. Percolation of extant sequence space has three major consequences: (1) It transforms our view of sequence space as a highly connected network where each sequence has multiple neighbors, and each pair of sequences is connected by many different paths. A high degree of connectedness is a necessary condition of efficient evolution, because it overcomes the possible blockage by sign epistasis and reciprocal sign epistasis. (2) The Fisher exponent is an indicator of connectedness and saturation of sequence space of each protein superfamily. (3) All clusters are expected to be connected by extant sequences that become apparent as a higher portion of extant sequence space becomes known. Being linked to biochemically distinct homologous families, bridging sequences are promising enzyme candidates for applications in biotechnology because they are expected to have substrate ambiguity or catalytic promiscuity.

Highlights

Despite the rapidly growing amount of DNA data due to advances in DNA sequencing techniques, only a tiny fraction of all protein sequences existing in the biosphere has been sequenced, yet
The known protein sequence space is rapidly increasing, but it represents only a tiny fraction of the extant sequence space, that has been explored during evolution
The extant sequence space represents a fraction p of the much bigger sequence space coding for functional proteins

Summary

Introduction

Despite the rapidly growing amount of DNA data due to advances in DNA sequencing techniques, only a tiny fraction of all protein sequences existing in the biosphere has been sequenced, yet. While we currently know the sequences of almost 108 proteins [1], the number of extant sequences was estimated to be 1034, and up to 1043 different protein sequences might have been explored during 4 Gyr of evolution [2]. Though this number seems to be large, it is infinitesimally small as compared to the theoretical sequence space (10400 possible sequences for a medium-sized protein), and it would be highly improbable to find functional proteins by random search [3]. The TEM β-lactamase family has a very high microdiversity, and the variants form a dense single network with nodes connected by single mutations [8]

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PloS one	Publication Date: Dec 20, 2017
Citations: 9	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Percolation in protein sequence space.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one

Lead the way for us

Similar Papers

Constraints on the expansion of paralogous protein families.
Conor J Mcclune ... Michael T Laub
Current Biology | VOL. 30
Conor J Mcclune, et. al.Conor J Mcclune ... Michael T Laub
01 May 2020
Current Biology | VOL. 30

Genome-wide nucleotide-level mammalian ancestor reconstruction.
Benedict Paten ... Javier Herrero
Genome Research | VOL. 18
Benedict Paten, et. al.Benedict Paten ... Javier Herrero
10 Oct 2008
Genome Research | VOL. 18

An in silico Exploration of the Neutral Network in Protein Sequence Space
Takuyo Aita ... Yuzuru Husimi
Journal of Theoretical Biology | VOL. 221
Takuyo Aita, et. al.Takuyo Aita ... Yuzuru Husimi
01 Apr 2003
Journal of Theoretical Biology | VOL. 221

Exact correspondence between walk in nucleotide and protein sequence spaces.
Dmitry N Ivankov
PLOS ONE | VOL. 12
Dmitry N IvankovDmitry N Ivankov
11 Aug 2017
PLOS ONE | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Percolation in protein sequence space.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one