Reverse osmosis and nanofiltration are used to purify feedwaters that contain a range of harmful organic solutes. The rejection of many of these solutes is poorly understood due to our limited ability to experimentally measure removal of any given compound. In this work, we present a machine learning approach that predicts organic solute rejection using molecular fingerprints that encode chemical structure features, such as functional groups and rings, into simple binary vectors. We trained machine learning models on a database of 1906 membrane rejection measurements including 228 organic compounds and 39 types of reverse osmosis and nanofiltration membranes. Three types of molecular fingerprint models (structural key, circular, and path based) were compared, and we observed that the Molecular Access System (MACCS) structural key had high performance (coefficient of determination of 0.87 with the testing set), fast calculation time due to its short bit-length, and easy interpretability. In addition to evaluating prediction performance, Shapley Additive Explanations (SHAP) analysis was implemented to gain a better molecular-scale understanding of membrane rejection, identifying molecular substructures that are important in determining their rejection. Overall, this work presents a method to predict the rejection of compounds that uses readily available molecular structure information and improves our ability to understand rejection mechanisms.
Read full abstract