Abstract

Many computational approaches exist for predicting the effects of amino acid substitutions. Here, we considered whether the protein sequence position class – rheostat or toggle – affects these predictions. The classes are defined as follows: experimentally evaluated effects of amino acid substitutions at toggle positions are binary, while rheostat positions show progressive changes. For substitutions in the LacI protein, all evaluated methods failed two key expectations: toggle neutrals were incorrectly predicted as more non-neutral than rheostat non-neutrals, while toggle and rheostat neutrals were incorrectly predicted to be different. However, toggle non-neutrals were distinct from rheostat neutrals. Since many toggle positions are conserved, and most rheostats are not, predictors appear to annotate position conservation better than mutational effect. This finding can explain the well-known observation that predictors assign disproportionate weight to conservation, as well as the field’s inability to improve predictor performance. Thus, building reliable predictors requires distinguishing between rheostat and toggle positions.

Highlights

  • Chains found in textbooks, which, in turn, are often used to design experimental mutation studies

  • These methods often assume that the variants used for training broadly represent the entire world of variation. Another extrapolation is implicit in any computational method that incorporates MSAs: if a particular variant-effect is known for one homolog, similar outcomes are expected for the same variant in other family members. Both the conventional substitution rules and training datasets are biased by a gap between unbounded evolutionary reality and limited laboratory work; i.e. laboratory variants are subject to experimental limitations and to the interests of the scientists

  • We recently showed that some functionally-important, non-conserved positions do not follow any of the evolutionary or biochemical assumptions made for conserved positions[11]: In that work, we identified 12 positions in the lactose repressor protein (LacI)/GalR family of proteins that varied widely among family members[8]

Read more

Summary

Introduction

Chains found in textbooks, which, in turn, are often used to design experimental mutation studies. Another extrapolation is implicit in any computational method that incorporates MSAs: if a particular variant-effect is known for one homolog, similar outcomes are expected for the same variant in other family members Both the conventional substitution rules and training datasets are biased by a gap between unbounded evolutionary reality and limited laboratory work; i.e. laboratory variants are subject to experimental limitations and to the interests of the scientists. Using the natural E. coli lactose repressor protein (LacI) and modified (synthetic) versions of seven LacI/GalR homologs (including GalR, PurR, and RbsR), we substituted the native amino acid in each of these positions with 5–13 other amino acids and measured functional outcomes[11] If these positions were functionally important, the conventional rules would predict that only a few similar substitutions would allow normal function and that most others would abolish function (e.g. Fig. 1, left panel). These effects did not correlate with evolutionary frequency, side chain similarities, or functional effects of the same substitutions in homologous proteins[11]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call