The identification of amino acid substitutions that both enhance the stability and function of a protein is a key challenge in protein engineering. Technological advances have enabled assaying thousands of protein variants in a single high-throughput experiment, and more recent studies use such data in protein engineering. We present a Global Multi-Mutant Analysis (GMMA) that exploits the presence of multiply-substituted variants to identify individual amino acid substitutions that are beneficial for the stability and function across a large library of protein variants. We have applied GMMA to a previously published experiment reporting on >54,000 variants of green fluorescent protein (GFP), each with known fluorescence output, and each carrying 1–15 amino acid substitutions (Sarkisyan et al., 2016). The GMMA method achieves a good fit to this dataset while being analytically transparent. We show experimentally that the six top-ranking substitutions progressively enhance GFP. More broadly, using only a single experiment as input our analysis recovers nearly all the substitutions previously reported to be beneficial for GFP folding and function. In conclusion, we suggest that large libraries of multiply-substituted variants may provide a unique source of information for protein engineering.
Read full abstract