A class-rebalancing self-training semisupervised learning for imbalanced data lithology identification

Shitao Yin,Xiaochun Lin,Zhifeng Zhang,Xiang Li

doi:10.1190/geo2023-0080.1

Abstract

Lithologic identification plays a crucial role in petroleum geologic exploration, and machine learning (ML) has become increasingly prevalent in intelligent lithology identification in recent years. However, identifying lithologies presents challenges due to a lack of lithologic labels and an imbalanced distribution of lithologies. To address this issue and obtain satisfactory lithologic identification results, this study investigates a class-rebalancing self-training (CReST) lithology identification framework. This framework uses logging data and limited lithologic labels as input and achieves promising lithology classification through the CReST approach. Four ML algorithms with high overall performance are selected from 25 common algorithms to establish CReST models, such as bagging classifier, extra trees classifier, random forest classifier, and support vector classifier. The classification results of the models are compared and analyzed under three conditions. The experimental findings indicate that (1) under label scarcity, the effect of category recognition varies greatly with different sample numbers; (2) under self-training (ST), overall performance is improved, but the difference in performance caused by category imbalance also increases; and (3) under CReST framework, the model effectively resolves the identification problems caused by a lack of labels and an imbalanced category distribution. Specifically, the precision of identifying categories with fewer samples is improved by more than 20%.

Full Text