Abstract

Aqueous solubility is an important property for conducting chemical reactions of the compound. In this research, we develop several machine learning models for predicting the aqueous solubility reaction of molecules. The open public dataset, AqSolDB, was used for model development which contains 9982 data on molecule solubility. Several machine learning regression models were trained on the dataset and their performance was evaluated using mean absolute error. In this research, we use machine learning model-based tree for model development. The result showed that the best model for solubility prediction is using Categoric Boosting Regressor achieving 0.854 mean absolute error. The importance of feature that affected solubility can also be calculated from the calculation. It is shown that variable MolLogP strongly correlated with solubility reaction. To further improve our model, we selected several features using a genetics algorithm and trained selected feature using several machine learning-based tree models. It showed that the lowest mean absolute error obtained from Categoric Boosting Regressor model achieving 0.771 which provides an improvement with previous calculation without feature selection.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call