Abstract

Differential scanning calorimetry (DSC) is a powerful technique to study temperature induced phase transitions by monitoring the heat capacity changes. Traditional ways of DSC data analysis require manual baseline subtraction and peak analysis, which inevitably leads to errors caused by human bias. To tackle this long-standing challenge, we propose an automated method for DSC signal extraction and baseline estimation based on semi-supervised machine learning. We implement an exponential modified Gaussian mixture (EMGM) model to identify the signal of interest, and use the expectation–maximization algorithm to optimize the log-likelihood of the model. This method is then combined with the iterative polynomial fitting method for baseline allocation. One advantage of the method is not requiring the knowledge of the signal, as it is learned through matrix factorization. We demonstrate the method’s efficacy using three types of protein data measured by distinctive DSC instruments. It can effectively identify the signals of interest from the raw signal and perform proper baseline subtraction. Furthermore, the program can accurately obtain the thermodynamic parameters from the peak signals for thermal characterization. In summary, this work’s automated signal processing method improves the speed and accuracy of the DSC data interpretation. The code and data for this work can be found at: https://github.com/shuyu-wang/DSC_data_analysis/.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call