Abstract

Ionic Liquids (ILs) are considered greener alternatives to traditional organic solvents due to their unique physical and chemical properties. Nevertheless, recent studies showed that ILs can induce toxic effects in ecosystem. Therefore, it is essential to determine the level of risk to the aquatic life to successfully use these ILs. Toxicity measurement of various ILs on a broad spectrum of conditions through experimental techniques is way demanding on time, resources, and is at times impractical. Various research works have been performed in Quantitative Property Relationship (QSAR/QSPR) for IL toxicity prediction expressed as EC50. In this study, five supervised machine learning models were trained and tested using nine Principal Properties (PPs) as descriptors to predict leukemia rat cell line (IPC-81) cytotoxicity. Then eight feature selection techniques were used to preprocess the data to improve the performance of the best machine learning model among the preliminary trained models. Analysis of the performance of the models on predicting the out-of-sample data set showed that the Extreme Gradient Boosting (XGBoost) supervised machine learning model is the best in predicting with the highest test score (R2 = 0.79). This model was the most parsimonious (minimum AIC of 46.50), consistent (minimum RMSE of 0.45), and precise (minimum MAE of 0.32) in predicting IPC-81 cytotoxicity. The feature importance attribute of XGBoost confirmed that the structural features of ILs’ cation like cationic hydrophilicity and the side chain length have significant impact on the toxicity. Nevertheless, the anionic part of IL is also important to their toxicity and needs to be considered in toxicity prediction. Among the tested feature selection techniques, the random forest technique was the best in improving model performance (i.e., the least error matrices: AIC = 41.22, MAE = 0.31 and RMSE = 0.4259 respectively) but at longer execution time. However, the wrapper methods were the most robust in improving computational efficiency (i.e, improved the model performance at the shortest execution time). Therefore, this study improves QSPR studies on toxicity prediction of new ILs with the application of machine learning and feature selection techniques.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call