Abstract
Discerning how a mutation affects the stability of a protein is central to the study of a wide range of diseases. Mutagenesis experiments on physical proteins provide precise insights about the effects of amino acid substitutions, but such studies are time and cost prohibitive. Computational approaches for informing experimentalists where to allocate wet-lab resources are available, including a variety of machine learning models. Assessing the accuracy of machine learning models for predicting the effects of mutations is dependent on experiments for amino acid substitutions performed in vitro. When similar experiments on physical proteins have been performed by multiple laboratories, the use of the data near the juncture of stabilizing and destabilizing mutations is questionable. In this work, we explore a systematic and principled alternative to discarding experimental data close to the juncture of stabilizing and destabilizing mutations. We model the inconclusive range of experimental [Formula: see text] values via 3- and 5-way classifiers, and systematically explore potential boundaries for the range of inconclusive experimental values. We demonstrate the effectiveness of potential boundaries through confusion matrices and heat map visualizations. We explore two novel metrics for assessing viable cutoff ranges, and find that under these metrics, a lower cutoff near [Formula: see text] and an upper cutoff near [Formula: see text] are optimal across multiple machine learning models.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Journal of bioinformatics and computational biology
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.