Abstract Estimating saturated hydraulic conductivity $$K_f$$ from particle size distributions (PSD) is very common with empirical formulas, while the use of machine learning for that purpose is not yet widely established. We evaluate the predictive power of six machine learning algorithms, including tree-based, regression-based and network-based methods in estimating $$K_f$$ from the PSD solely. We use a dataset of 4600 samples from the shallow Dutch subsurface for training and testing. The extensive dataset provides not only PSD, but also measured conductivities from permeameter tests. Besides training and testing on the entire data set, we apply the six algorithms to data subsets for the soil types sand, silt and clay. We further test different feature/target-variable combinations such as reducing the input to PSD-derived grain diameters $$d_{10}$$ , $$d_{50}$$ and $$d_{60}$$ or estimating porosity from PSD. We test feature importance and compare results to $$K_f$$ estimates from a selection of empirical formulas. We find that all algorithm can estimate $$K_f$$ from PSD at high accuracy (up to $$R^2/NSE$$ of 0.89 for testing data and 0.98 for the entire data set) and outperform empirical formulas. Particularly, tree-based algorithms are well suited and robust. Reducing information in the feature variables to grain diameters works well for predicting $$K_f$$ of sandy samples, but is less robust for silt and clay rich samples. $$d_{10}$$ also shows to be the most influential feature here. An interesting, but not surprising outcome is that PSD is not a suitable predictor for porosity. Overall, our results confirm that machine learning algorithms are a powerful tool for determining $$K_f$$ from PSD. This is promising for applications to e.g. deep-drilling data sets or low-effort and robust $$K_f$$ -estimation of single samples.
Read full abstract