Imputation of missing well log data by random forest and its uncertainty analysis

Runhai Feng,Dario Grana,Niels Balling

doi:10.1016/j.cageo.2021.104763

Abstract

Well logs are commonly used by geoscientists to infer and extrapolate physical properties of subsurface rocks. However, at some depth intervals, well log values might be missing due to operational issues in the logging process. To overcome this problem, an innovative approach to reconstruct well logs is proposed using machine learning methods. Based on other complete logging features, the missing well log values are predicted by data-driven machine learning algorithms, namely random forest. A grid-searching scheme is applied to find a combination of hyper-parameters for the best cross-validation score. During the training process, the relative importance of different input features is analysed to remove weakly sensitive measurements and prioritize data with strong correlation with the target variables. Principal component analysis is applied to explore the multicollinearity in the input features, such that only few principal components in the new data vector are used to represent a large fraction of the variance in the original data. To quantify the uncertainty in the predictions, a quantile regression tree is used for determining prediction intervals. Well log data from the Volve Field are used for validation of the prediction obtained by random forest, in which a high correlation coefficient between prediction and reference is achieved. The prediction intervals of different percentiles are estimated, and show more accurate results at depth points where a small range of the prediction intervals exists.

Full Text