Abstract

SUMMARYFor many machine learning problems, there are sufficient data to train a wide range of algorithms. However, many geoscience applications are challenged with limited training data. Seismic petrophysical classification, mapping seismic data to litho-fluid classes, is one of these examples because the training data labels are based on data gathered from wells. Supervised machine learning algorithms are prone to overfitting in scarce training data situations, but semisupervised approaches are designed for these problems because the unlabelled data are also used to inform the learning process. We adopt label propagation (LP) and self-training methods to solve this problem, because they are semisupervised methods that are conceptually simple and easy to implement. The supervised method we consider for comparison is the popular extreme gradient boosting (XGBoost) classifier. The data set we use for our study is one we generate ourselves from the SEG Advanced Modelling (SEAM) Phase 1 model. We first synthesize seismic data from this model and then perform pre-stack seismic inversion to recover seismic attributes. We formulate a classification problem using the seismic attributes as unlabelled data, with training labels from a single well. The benefit of this being a synthetic problem is that we have full control and the ability to quantitatively assess the machine learning predictions. Our initial results reveal that the inherent depth-dependent background trends of the input attributes produce artefacts in each of the machine learning predictions. We address this problem by using a simple median filter to remove these background trends. The predictions using the detrended inputs improve the performance for all three algorithms, in some cases on the order of 10 to 20 per cent. XGBoost and LP perform similarly in some situations, but our results indicate that XGBoost is rather unstable depending on the attributes used. However, LP coupled with self-training outperforms XGBoost by up to 10 per cent in some instances. Through this synthetic study, our results support the premise that semisupervised algorithms can provide more robust, generalized predictions than supervised techniques in minimal training data scenarios.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call