Do Natural Proteins Differ from Random Sequences Polypeptides? Natural vs. Random Proteins Classification Using an Evolutionary Neural Network

Davide De Lucrezia,Debora Slanzi,Giovanni Minervini,Fabio Polticelli,Irene Poli,Ricard V Solé

doi:10.1371/journal.pone.0036634

Davide De Lucrezia, Debora Slanzi + Show 4 more

Open Access

PDF Available

https://doi.org/10.1371/journal.pone.0036634

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Are extant proteins the exquisite result of natural selection or are they random sequences slightly edited by evolution? This question has puzzled biochemists for long time and several groups have addressed this issue comparing natural protein sequences to completely random ones coming to contradicting conclusions. Previous works in literature focused on the analysis of primary structure in an attempt to identify possible signature of evolutionary editing. Conversely, in this work we compare a set of 762 natural proteins with an average length of 70 amino acids and an equal number of completely random ones of comparable length on the basis of their structural features. We use an ad hoc Evolutionary Neural Network Algorithm (ENNA) in order to assess whether and to what extent natural proteins are edited from random polypeptides employing 11 different structure-related variables (i.e. net charge, volume, surface area, coil, alpha helix, beta sheet, percentage of coil, percentage of alpha helix, percentage of beta sheet, percentage of secondary structure and surface hydrophobicity). The ENNA algorithm is capable to correctly distinguish natural proteins from random ones with an accuracy of 94.36%. Furthermore, we study the structural features of 32 random polypeptides misclassified as natural ones to unveil any structural similarity to natural proteins. Results show that random proteins misclassified by the ENNA algorithm exhibit a significant fold similarity to portions or subdomains of extant proteins at atomic resolution. Altogether, our results suggest that natural proteins are significantly edited from random polypeptides and evolutionary editing can be readily detected analyzing structural features. Furthermore, we also show that the ENNA, employing simple structural descriptors, can predict whether a protein chain is natural or random.

Highlights

The question whether extant proteins are the exquisite result of natural selection or rather they represent random co-polymers slightly edited by evolution has stirred an intense discussion for the last twenty years for its implications in origin of Life [1], macromolecule aetiology [2,3] and evolution at large [3,4,5].From the molecular point of view, protein evolution can be viewed as a search and optimization process in the sequence space to identify suitable sequences capable to fulfill a functional requirement
We initially investigated a set of 902 natural proteins (Nat) whose tertiary structure was experimentally resolved and a set of 20494 completely random protein (Rnd) sequences generated using a uniform amino acid frequency distribution with no significant homology to natural ones
The first striking outcome is that in general natural proteins show a broader distribution with respect to random ones for most of the variables investigated (Figure 1 and 2). This general feature can be explained considering that random proteins represent statistical copolymers and their structural features are centered around the mean with a variance equal to the one expected by the correspondent probability density function

Summary

Introduction

The question whether extant proteins are the exquisite result of natural selection or rather they represent random co-polymers slightly edited by evolution has stirred an intense discussion for the last twenty years for its implications in origin of Life [1], macromolecule aetiology [2,3] and evolution at large [3,4,5].From the molecular point of view, protein evolution can be viewed as a search and optimization process in the sequence space to identify suitable sequences capable to fulfill a functional requirement. Extant proteins can be considered as a highly specific output of a long and intricate evolutionary history and they are as unique as the evolutionary pathway that produced them This perspective has been challenged by several authors who raised the problem of whether and to what extent proteins are the unique product of evolution or a sheer accident [4]. The rational beyond this argument relies on the vastness of the sequence space which grows exponentially with the length of the protein. Some authors put forward the notion that extant proteins are the mere output of a contingent process dictated by the simultaneous interplay of several independent causes so that extant proteins can be regarded as a frozen accident [1]

Methods

Results

Conclusion