Statistical Learning from Single-Molecule Experiments: Support Vector Machines and Expectation-Maximization Approaches to Understanding Protein Unfolding Data.

Farkhad Maksudov,Lee K Jones,Valeri Barsegov

doi:10.1021/acs.jpcb.1c02334

Abstract

Single-molecule force spectroscopy has become a powerful tool for the exploration of dynamic processes that involve proteins; yet, meaningful interpretation of the experimental data remains challenging. Owing to low signal-to-noise ratio, experimental force-extension spectra contain force signals due to nonspecific interactions, tip or substrate detachment, and protein desorption. Unravelling of complex protein structures results in the unfolding transitions of different types. Here, we test the performance of Support Vector Machines (SVM) and Expectation Maximization (EM) approaches in statistical learning from dynamic force experiments. When the output from molecular modeling in silico (or other studies) is used as a training set, SVM and EM can be applied to understand the unfolding force data. The maximal margin or maximum likelihood classifier can be used to separate experimental test observations into the unfolding transitions of different types, and EM optimization can then be utilized to resolve the statistics of unfolding forces: weights, average forces, and standard deviations. We designed an EM-based approach, which can be directly applied to the experimental data without data classification and division into training and test observations. This approach performs well even when the sample size is small and when the unfolding transitions are characterized by overlapping force ranges.

Full Text