Application of Machine Learning Algorithms to Estimate Enzyme Loading, Immobilization Yield, Activity Retention, and Reusability of Enzyme–Metal–Organic Framework Biocatalysts

Milton Chai,Amir Razmjou,Eila Erfani,Sina Moradi,Vicki Chen,Mohsen Asadnia

doi:10.1021/acs.chemmater.1c02476

Abstract

The ability to predict enzyme–metal–organic framework (MOF) properties such as enzyme loading, immobilization yield, activity retention, and reusability can maximize product yield and extend the operational life of enzyme–MOF biocatalysts. However, this is challenging due to the vast combinations of available metal and ligand building blocks for MOF and enzymes. Therefore, several machine learning (ML) algorithms are applied in this study using data collected from 127 journal articles to estimate these biocatalyst parameters. Twelve input variables, including the metal and ligand properties of MOF, as well as the enzyme properties, were integrated and fed into two ML algorithms─random forest and Gaussian process regression (GPR)─to predict model outputs. A 10-fold cross-validation approach with grid search was applied to obtain the optimal hyperparameter values. The random forest model (RFM) provided more accurate estimates of the enzyme loading, immobilization yield, and reusability of the biocatalyst than the GPR model, with relatively high R2 values of 0.85, 0.77, and 0.91, respectively. Both models are less effective in predicting the enzyme activity retention, however, with R2 values of 0.63 or lower. Sensitivity analysis of the input variables revealed the most significant variables for each corresponding output parameter, allowing further optimization of the RFM. The final RFM was then tested with a second unseen dataset collected from experiments. Findings confirmed the validity of the predictive model, including a relative error of less than 25%. Our model can aid in the synthesis of enzyme–MOF biocatalysts by providing valuable estimates of these output parameters for different MOF precursors and enzymes, saving experimental time and cost.

Full Text