Abstract

The ongoing explosion in sequence information in current organisms is transforming most aspects of biological research including protein science. In protein science, increased availability of sequence information is stimulating molecular investigations into the mechanism by which protein structure and function have changed over the course of natural evolution. At the same time, many approaches initiated within protein science, including structural and biophysical models, are being utilized to improve the interpretation of sequence information. In addition, protein engineering and design are enabling explorations of protein structure and function in ways that may be distinct from what has evolved in nature. This special issue is dedicated to the richness of research currently underway at the interface between protein science and molecular evolution. For example, computational protein design was considered a purely ab inito approach with no relation to protein evolution. However, these areas are becoming increasingly interactive. The papers in this special issue span a breadth of topics that can be clustered into four general categories. These categories are presented here as a way to highlight some of the interconnections between protein science and molecular evolution. Of note, each paper is not limited to fitting cleanly within any one category. 1. Investigating mechanisms of natural protein evolution. These studies include questions regarding how the proteins that we observe in current organisms have evolved? For example, what were the sequence changes that occurred in evolution and how did these alter the structure and function of the evolving proteins? Many approaches and tools can be brought to bear on these questions including structural intuition and likely ancestral protein reconstruction methods. Reconstruction approaches often generate trees of protein evolution and predict the likely sequences of each ancestor on the tree, which can then be generated and analyzed experimentally to provide information on the potential mechanism by which the protein properties may have evolved. In this issue, ancestral protein reconstruction was utilized by Steindel, Theobald and colleagues [DOI: 10.1002/pro.2904] to investigate the evolution of lactate dehydrogenase (LDH) in trichomonad parasites. Their results indicate that current LDH enzymes evolved from an ancestral protein that was specific for malate dehydrogenase activity (MDH). This study identified mutations that shifted the likely ancestral protein from a specific MDH activity to a protein with both LDH and MDH activities, indicating that although enzyme promiscuity is thought to be a hallmark of the early enzymes (generalists to specialist), natural specialist enzymes can also evolve to generalist ones. Van Hazel, Chang, and co-workers [DOI: 10.1002/pro.2902] examined the molecular evolution of rhodopsin in bowerbirds. This analysis was motivated in part by the observation that all known bowerbird species contain an unusual asparagine at position 83. This asparagine has been found in a variety of dim-light adapted organisms where it often occurs in concert with a substitution at another otherwise highly conserved site, A292S. Mutations at these positions in bower bird, chicken and cow rhodopsin indicate that the spectral impacts of these substitutions are largely additive and impact both spectral tuning (e.g. the wavelength of maximal absorbance and the kinetics of light activation). In an impressive set of engineering and biochemical studies, Cahn, Arnold and colleagues [DOI: 10.1002/pro.2852] demonstrate that monomeric class II ketol-acid reductoisomerase (KARII) can evolve from dimeric KARI by duplication of the C-terminal domain. This evolutionary mechanism had been postulated based on sequence and structural observations, though experimentally it was not clear if a single duplication event could result in a functional enzyme. The authors generated four duplicated constructs based on a thermophilic KARII starting gene and observed that two of these exhibited strong enzyme activity. Also, crystallographic studies of one dimeric variant demonstrated that it formed a structure that was virtually superimposable with KARI enzymes. In another study aimed at understanding duplication events in protein evolution, Schaeffer, Grishin and co-workers [DOI: 10.1002/pro.2893] explored duplication and divergence of small motifs. Small motifs are difficult to identify by many structural classification strategies. The authors examine how a tool that they developed called the Evolutionary Classification of protein Domains (ECOD), that utilizes both sequence and structural information could be utilized to analyze the evolution of small motifs. Using this approach, the authors indicate multiple instances where proteins appear to have evolved through the duplication and divergence of small motifs. Utilizing structural intuition, Theobald [DOI: 10.1002/pro.2919] notes that presenillin, which is an integral membrane aspartate protease that cleaves proteins within the membrane is structurally similar to ClC proteins that enable chloride to cross membranes. A recent crystal structure of presenillin showed a bundle of 9 helices with a pore in the middle, which was postulated to be a new protein fold as DALI searches did not find a related structure. ClC proteins that are chloride channels have a similar helical topology and pore in the middle. The author notes that the hole in presenillin is associated, albeit controversially, with calcium flux. 2. Incorporating biophysics and the interdependence of mutations into evolutionary models. As sequence databases continue to grow, there is tremendous interest in modeling the evolutionary relationships between the protein sequences in current organisms. While purely sequence-based models are often utilized because they are simple, growing evidence indicates that incorporating structural and biophysical models can lead to more reliable predictions. A key theme in these discussions is the observation that the effect of an amino acid change is often dependent on the sequence of the rest of the protein, a feature referred to as epistasis. Goldstein and Pollock [DOI: 10.1002/pro.2930] provide a prospective on protein evolution and the distinctions that can come from biophysics-based models of protein evolution compared to models based purely on sequence and statistics. The authors point out that the impacts of an amino acid substitution on the biophysical properties of a protein depends on the sequence of the rest of the protein in a way that is difficult to impossible to capture in methods based purely on protein sequence. This thoughtful perspective points out that there is much work to do in order to interpret protein evolution based on the sequences of current species. Starr and Thornton [DOI: 10.1002/pro.2897] explore the impact of epistasis on protein evolution. A growing number of studies indicate that epistasis is pervasive in the natural evolution of protein sequences. This thoughtful prospective points out two broad classes of epistatic interactions: specific epistasis where one amino acid change influences a small number of other amino acid changes, and nonspecific epistasis where an amino acid change modifies the effects of mutations in many other positions. The authors point out that specific epistasis imposes stricter constraints and therefore has a more dramatic impact on evolution. When specific epistasis occurs it tends to leave a stronger evolutionary mark than nonspecific epistasis. Miton and Tokuriki [DOI: 10.1002/pro.2876] examined the influence of epistasis, by compiling and analyzing a dataset from the directed evolution of nine enzymes. The authors compared the effects of mutations in the evolutionary context where they occurred to the effects of the mutations in the parental background. Approximately half of the mutations that were beneficial in the context of the directed evolution trajectory were of no benefit in the parental genetic background. Positive epistasis was observed for both amino acids that were structurally close together as well as those that were far apart. Chi and Liberles [DOI: 10.1002/pro.2886] consider the potential to understand natural protein evolution from first principles and modeling. They examine the biochemical and biophysical properties of proteins that may be subject to natural selection and the ability of each of these to be modeled. They consider a broad array of biochemical properties including protein folding stability, binding affinity, protein dynamics, and protein expression. While tremendous progress has been made on modeling individual properties, it remains a daunting challenge to generate a predictive model of natural protein evolution. Cheron, Shakhnovich and co-workers [DOI: 10.1002/pro.2915] developed a biophysical model of viral expansion in the face of selection pressure from host antibodies. In contrast to previous investigations of viral evolution, the distribution of fitness effects is based on biophysical principles that enable this property to evolve in the model, mimicking the way that fitness effects are known to depend on genetic background in nature. The models predict that the optimal mutation rate depends on the level of stress from antibodies, the genome size, and the burst size (the number of viral particles produced from an infected host cell). Jackson, Wilke and co-workers [DOI: 10.1002/pro.2920] investigated how structural properties correlate with the relative rate at which a position in a protein evolves. The study focused on both solvent accessibility (a measure of the fraction of each amino acid that is capable of contacting solvent), and contact number (a measure of the density of protein atoms surrounding each residue). Recent studies on two different sets of proteins made contradictory observations. The analyses in this work indicate that the level of sequence divergence impacts the observed correlations. The authors are careful to indicate that further studies will need to be performed to examine the generalness of this trend and whether it can be extended to other data sets. 3. Engineering proteins with new structures and functions. These studies aim to understand the potential of proteins to provide structures and functions that are distinct from those that may be selected during natural evolution. In principle, these studies provide fundamental insights into the connections between biophysics and selection with potential relevance to relatively rare events in natural evolution where new functions and protein structures arise. Basanta, Baker and co-workers [DOI: 10.1002/pro.2899] utilized computational design to explore the potential to engineer “inside-out” proteins with polar interactions in the interior of the protein and hydrophobic surfaces that could be stable in organic solvents. They redesigned the interior of Top7, a de novo designed protein whose core is entirely hydrophobic. The design identified a network of polar amino acids that could be introduced into the core. The resulting protein expressed well, was folded, and based on NMR structures, three of the five core polar positions formed the predicted hydrogen bonding interactions. Two of the introduced core polar amino acid changes appeared to interact with solvent – suggesting that negative design principles to limit potential solvent interactions may be important for the prediction of polar side chain interactions in this context. Schaefer, Bailey and Kossiakoff [DOI: 10.1002/pro.2888] described experimental investigation of the solubility-promoting potential of aspartate mutations in an aggregation-prone antibody. Building on the observations that aspartate residues are highly soluble and had previously been associated with solubilizing antibodies, the authors constructed libraries of aspartate mutations surrounding the antigen recognition site. Using a combination of approaches including selection experiments as well as biochemical investigation of individual variants, the authors identified aspartate mutations that increased solubility greater than 20-fold without compromising affinity for ligand. Khersonsky and Fleishman [DOI: 10.1002/pro.2892] provide a thoughtful review of protein engineering. The authors note that computational design approaches have made tremendous progress, particularly with regards to generating proteins that are stable to unfolding and that are capable of adopting target native structures. The authors consider current challenges including the design of protein function and backbone conformations that lack secondary structure elements. Xia, Blaber and colleagues [DOI: 10.1002/pro.2848] investigated the “folding nucleus” in an engineered symmetric beta-trefoil protein using phi-value analysis. The engineered protein was developed from a natural precursor, and the authors considered how accessible this engineering pathway would be to natural evolution based on the premise that intermediates in the evolutionary process are more favorable if they are capable of folding. Hoegler and Hecht [DOI: 10.1002/pro.2871] examined the ability of random proteins to confer copper tolerance in E. coli. On the order of a million protein sequences were tested for promoting copper tolerance. Individual isolates that grew on plates where a negative control cell failed to grow due to the concentration of copper were sequenced. Analyses of one of these indicated that the resulting protein bound to copper. Cells containing this protein tolerated slightly greater concentrations of copper than control cells lacking the protein. 4. Exploring evolutionary potential using laboratory selection experiments. Technological advances have greatly increased the throughput with which the effects of mutations can be quantified in laboratory experiments. Many of these approaches utilize sequencing to monitor the frequency of thousands of mutations before and after a selection event, providing a readout of the impact of each of mutant on the function selected. These studies can provide a landscape view of the effects of many mutations on protein function and provide some information regarding functional and evolutionary potential. Foight and Keating [DOI: 10.1002/pro.2881] describe a deep mutational scan of peptides that bind to TRAF's, cytoplasmic proteins that bind to the tails of Tumor Necrosis Factor Receptor superfamily members. The results of these analyses indicate that different TRAF's exhibit complex preferences for distinct peptides and indicate that simplistic rules do not accurately predict the peptide sequences that will bind most strongly to different TRAF's. Boucher, Bolon and Tawfik [DOI: 10.1002/pro.2928] compare and contrast results from laboratory evolution experiments and natural evolution. The former has been transformed by recent technical advances. Improvements in sequencing efficiency have led to an explosion of available genome sequences as well as information on polymorphism within populations, shedding new light on the process by which mutations occur and propagate in nature. Improvements in the ability to engineer and track mutations have led to a similar increase in the ability to quantify the effects of thousands of mutations in laboratory experiments. The dramatically different timescales of laboratory and natural evolution are critical to the appropriate comparison of these two types of experiments. However, laboratory experiments may be less relevant to understanding long-term inter-species variations yet insightful with regard to shorter-term evolution of polymorphism. Daniel N. A. Bolon Department of Biochemistry and Molecular Pharmacology University of Massachusetts Medical School Worcester, MA 01605 E-mail: Dan.Bolon@umassmed.edu David Baker Molecular Engineering and Sciences University of Washington Box 351655 Seattle, WA 98195-1655 E-mail: dabaker@u.washington.edu Dan S. Tawfik Department of Biological Chemistry Weizmann Institute of Science Rehovot 76100, Israel E-mail: dan.tawfik@weizmann.ac.il

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call