Abstract

Here, we summarize a line of remarkably simple, theoretical research to better understand the chemical logic by which life’s standard alphabet of 20 genetically encoded amino acids evolved. The connection to the theme of this Special Issue, “Protein Structure Analysis and Prediction with Statistical Scoring Functions”, emerges from the ways in which current bioinformatics currently lacks empirical science when it comes to xenoproteins composed largely or entirely of amino acids from beyond the standard genetic code. Our intent is to present new perspectives on existing data from two different frontiers in order to suggest fresh ways in which their findings complement one another. These frontiers are origins/astrobiology research into the emergence of the standard amino acid alphabet, and empirical xenoprotein synthesis.

Highlights

  • For protein structure prediction involving non-canonical amino acids, the most recent, significant advance of which we are aware used a highly sophisticated combination of force field libraries and molecular dynamics simulations to predict structures for 551 peptides [2]

  • Subject to further details provided in the forthcoming CASP14 issue of Proteins, it appears that AlphaFold can fulfill this potential not just for short peptides but for proteins sequences hundreds of amino acids in length, and across the universe of protein folds

  • It is not clear that a machine learning algorithm would learn to predict this possibility for thermodynamically favorable covalent bond formation between two sulfur atoms from the physicochemical rules learned by studying proteins comprising only the other 19 amino acids. Since disulfide bridges both enabled Anfinsen’s foundational discovery and remain areas of active research when it comes to their role in protein folding [14], it seems pertinent to ask how confident can we be that no further phenomena exist within an indefinitely diverse set of non-canonical amino acid (ncAA)’s to modify our understanding of sequence/structure relationships? Far less extreme than new covalent bonds, we already know that “side-chain and backbone interactions [within ‘natural’ protein sequences] may provide the energetic compensation necessary for populating [hitherto unrecognized]

Read more

Summary

Introduction

For protein structure prediction involving non-canonical amino acids (ncAA’s [1]), the most recent, significant advance of which we are aware used a highly sophisticated combination of force field libraries and molecular dynamics simulations to predict structures for 551 peptides [2]. It is not clear that a machine learning algorithm would learn to predict this possibility for thermodynamically favorable covalent bond formation between two sulfur atoms from the physicochemical rules learned by studying proteins comprising only the other 19 amino acids Since disulfide bridges both enabled Anfinsen’s foundational discovery and remain areas of active research when it comes to their role in protein folding [14], it seems pertinent to ask how confident can we be that no further phenomena exist within an indefinitely diverse set of ncAA’s to modify our understanding of sequence/structure relationships? A closer look at empirical success incorporating ncAA’s demonstrates why the challenge of developing statistical scoring functions for an indefinitely diverse set of ncAA’s is both timely and important

Hundreds of ncAAs Have Already Been Incorporated into Proteins
The “Standard Alphabet” Is Distinctly Non-Random in Simple Ways
A “better”
Genetically
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call