Improved well log classification using semisupervised Gaussian mixture models and a new hyper-parameter selection strategy

Michael W Dunham,Alison Malcolm,J Kim Welford

doi:10.1016/j.cageo.2020.104501

Abstract

Well log classification, the process of mapping well log measurements to lithofacies identified from core samples, is a common procedure in the oil and gas industry. Manually assigning lithofacies to the wire-line log measurements without core can be time consuming, and can also introduce a bias. Supervised machine learning algorithms are commonly used to automate this process, but they are prone to overfitting when the training data are scarce, which is common for well log classification problems. Semisupervised machine learning algorithms are designed for classification problems with minimal training data, and we adopt a semisupervised Gaussian mixture model (ssGMM) method to solve this problem. The dataset we consider for our study is from a machine learning competition held in 2016 and we simulate a semisupervised scenario by assuming only one out of the ten wells is the labeled data. We apply ssGMM to this well log dataset and compare its performance to the supervised method that was the winner of this competition, XGBoost. To try and improve the performance of both ssGMM and XGBoost, we also introduce a new hyper-parameter selection strategy that simultaneously uses the mean and standard deviation cross-validation scores, compared to the default procedure that only utilizes the mean cross-validation scores. Our results indicate that ssGMM is able to slightly outperform XGBoost in our semisupervised context, which supports the suggestion that semisupervised algorithms are more appropriate in low training data situations. We also show that our new hyper-parameter selection technique selects hyper-parameters for ssGMM that perform better on the testing data, but the performance is mixed for XGBoost.

Full Text