Optimized feature selection assists lithofacies machine learning with sparse well log data combined with calculated attributes in a gradational fluvial sequence

David A Wood

doi:10.1016/j.aiig.2022.11.003

Abstract

Machine learning (ML) to predict lithofacies from sparse suites of well-log data is difficult in laterally and vertically heterogeneous reservoir formations in oil and gas fields. Meandering, braided fluviatile depositional environments tend to form clastic sequences with laterally discontinuous layers due to the continuous shifting of relatively narrow sandstone channels. Three cored wellbores drilled through such a reservoir in a large oil field, with just four recorded well logs available, are used to classify four lithofacies using ML models. To augment the well-log data, six derivative and volatility attributes were calculated from the recorded gamma ray and density logs, providing sixteen log features for the ML models to select from. A novel, multiple-optimizer feature selection technique was developed to identify high-performing feature combinations with which seven ML models were used to predict lithofacies assisted by multi-k-fold cross validation. Feature combinations with just seven to nine selected log features achieved overall ML lithofacies accuracy of 0.87 for two wells used for training and validation. When the trained ML models were applied to a third well for testing, lithofacies ML prediction accuracy declined to 0.65 for the best performing extreme gradient boosting model with seven features. However, an accuracy of ∼0.76 was achieved by that model in predicting the presence of the pay bearing sandstone and siltstone lithofacies in the test well. A model using only the four recorded well logs was only able to predict the pay-bearing lithofacies with ∼0.6 accuracy. Annotated confusion matrices and feature importance analysis provide additional insight to ML model performance and identify the log attributes that are most influential in enhancing lithofacies prediction.

Full Text