Currently there are no accurate models for the prediction of diffusion coefficients at infinite dilution in aqueous systems. Frequently, models that work well for polar solvents often perform worse in the case of water. At the same time, experimental data of tracer diffusion coefficients are scarce and can be impractical to measure when information on this important transport property is required. In this work, machine learning models were developed to predict the tracer diffusion coefficient of any solute in water at atmospheric pressure. Several approaches were carried out to construct the model, using different types of input parameters: pure component properties and theoretical molecular descriptors, such as atom counts, structural fragments and fingerprints, computed using different sources. A database of 126 systems (1192 data points) was used for training and the best model showed a global average absolute relative deviation (AARD) of 3.92%, with a maximum deviation of 24.27% on the test set. This model uses as inputs the temperature and 195 molecular descriptors computed using the RDKit cheminformatics package, which can be automatically calculated from a molecular identifier thus making the model very simple to use. In comparison, the well-known Wilke-Chang equation provided an AARD of 13.03% in the same test set, demonstrating the improved accuracy of the proposed solution. The models developed in this work are provided at github.com/EgiChem/ml-D12-water-app.
Read full abstract