Machine-learning algorithms have long aided in geologic property prediction from well-log data, but are primarily used to classify lithology, facies, formation, and rock types. However, more detailed properties (e.g., porosity, grain size) that are important for evaluating hydrocarbon exploration and development activities, as well as subsurface geothermal, CO2 sequestration, and hydrological studies have not been a focus of machine-learning predictions. This study focuses on improving machine-learning regression-based workflows for quantitative geological property prediction (porosity, grain size, XRF geochemistry), using a robust dataset from the Dad Sandstone Member of the Lewis Shale in the Green River Basin, Wyoming. Twelve slabbed cores collected from wells targeting turbiditic sandstones and mudstones of the Dad Sandstone member provide 1212.2 ft. of well-log and core data to test the efficacy of five machine-learning models, ranging in complexity from multivariate linear regression to deep neural networks. Our results demonstrate that gradient-boosted decision-tree models (e.g., CatBoost, XGBoost) are flexible in terms of input data completeness, do not require scaled data, and are reliably accurate, with the lowest or second lowest root mean squared error (RMSE) for every test. Deep neural networks, while used commonly for these applications, never achieved lowest error for any of the testing. We also utilize newly collected XRF geochemistry and grain-size data to constrain spatiotemporal sediment routing, sand-mud partitioning, and paleo-oceanographic redox conditions in the Green River Basin. Test-train dataset splitting traditionally uses randomized inter-well data, but a blind well testing strategy is more applicable to most geoscience applications that aim to predict properties of new, unseen well locations. We find that using inter-well training datasets are more optimistic when applied to blind wells, with a median difference of 0.58 RMSE when predicting grain size in phi units. Using these data and results, we establish a baseline workflow for applying machine-learning regression algorithms to core-based reservoir properties from well-log and core-image data. We hope that our findings and open-source code and datasets released with this paper will serve as a baseline for further research to improve geological property prediction for sustainable earth-resource modeling.
Read full abstract