Abstract

Deep mutational scanning is a widely used method for multiplex measurement of functional consequences of protein variants. We developed a new deep mutational scanning statistical model that generates error estimates for each measurement, capturing both sampling error and consistency between replicates. We apply our model to one novel and five published datasets comprising 243,732 variants and demonstrate its superiority in removing noisy variants and conducting hypothesis testing. Simulations show our model applies to scans based on cell growth or binding and handles common experimental errors. We implemented our model in Enrich2, software that can empower researchers analyzing deep mutational scanning data.

Highlights

  • Exploring the relationship between sequence and function is fundamental to enhancing our understanding of biology, evolution, and genetically driven disease

  • Variant standard errors are calculated for each selection and replicate score, allowing the experimenter to remove noisy variants or perform hypothesis testing

  • One alternative is to combine replicate scores using a fixed-effect model [29]. We examined this approach for the BRCA1 E3 ubiquitin ligase dataset (Fig. 4) and found that because variant scores can vary widely between replicates, this method dramatically underestimates the standard error of the combined variant score

Read more

Summary

Introduction

Exploring the relationship between sequence and function is fundamental to enhancing our understanding of biology, evolution, and genetically driven disease. Deep mutational scanning has greatly enhanced our ability to probe the protein sequence-function relationship [1] and has become widely used [2]. Deep mutational scanning has been applied to comprehensive interpretation of variants found in disease-related human genes [3, 4], understanding protein evolution [5,6,7,8,9], and probing protein structure [10, 11] with many additional possibilities on the horizon [2]. Selections can Fundamental gaps remain in our ability to use deep mutational scanning data to accurately measure the effect of each variant because practitioners lack a unifying statistical framework within which to interpret their results. Two established implementations of deep mutational scanning scoring methods, Enrich [19] and EMPIRIC [20], calculate variant scores based on the ratio of variant

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call