Machine learning is the main technical means for lithofacies logging identification. As the main target of shale oil spatial distribution prediction, mud shale petrography is subjected to the constraints of stratigraphic inhomogeneity and logging information redundancy. Therefore, choosing the most applicable machine learning method for different geological characteristics and data situations is one of the key aspects of high-precision lithofacies identification. However, only a few studies have been conducted on the applicability of machine learning methods for mud shale petrography. This paper aims to identify lithofacies using commonly used machine learning methods. The study employs five supervised learning algorithms, namely Random Forest Algorithm (RF), BP Neural Network Algorithm (BPANN), Gradient Boosting Decision Tree Method (GBDT), Nearest Neighbor Method (KNN), and Vector Machine Method (SVM), as well as four unsupervised learning algorithms, namely K-means, DBSCAN, SOM, and MRGC. The results are evaluated using the confusion matrix, which provides the accuracy of each algorithm. The GBDT algorithm has better accuracy in supervised learning, while the K-means and DBSCAN algorithms have higher accuracy in unsupervised learning. Based on the comparison of different algorithms, it can be concluded that shale lithofacies identification poses challenges due to limited sample data and high overlapping degree of type distribution areas. Therefore, selecting the appropriate algorithm is crucial. Although supervised machine learning algorithms are generally accurate, they are limited by the data volume of lithofacies samples. Future research should focus on how to make the most of limited samples for supervised learning and combine unsupervised learning algorithms to explore lithofacies types of non-coring wells.
Read full abstract