Pharmaceutical compounds have become an increasingly important source of pollutants in wastewaters being conventional treatments ineffective in removing them, so they are commonly discharged into the environment. Pharmaceuticals can be successfully removed using liquid-liquid extraction, and COSMO-RS can be used to predict interactions and identify the most promising solvents. However, COSMOtherm models cannot account for key process parameters, which reduces the accuracy of these computational models. Therefore, there is a need for alternative computational approaches to accurately predict the extraction yields of pharmaceuticals which can incorporate both processing and interaction variables. This work used machine learning to predict the extraction yield of eleven pharmaceuticals using eight solvents. Six regression models and two classification models were explored. The best performance was obtained with ANN regressor (test MAE: 4.510, test R2: 0.884) and RF classifier (test accuracy: 0.938, test recall: 0.974). The RF regression analysis and classification also showed key extraction yield features: solvent-to-feed ratio, n–octanol–water partition coefficient, hydrogen bond and Van der Waals contributions to excess enthalpy, and pH distance to nearest pKa. Machine learning showed as an excellent tool for screening and selecting the most promising solvents and process conditions to remove pharmaceuticals from wastewater.
Read full abstract