Lithology identification is essential for geological analysis and reservoir characterization of hydrocarbon bearing formation. Conventional approaches of lithology identification are not only inconvenient but also time-consuming and expensive. Machine learning (ML) algorithms have been introduced as an effective, reliable, and accurate method of lithology identification using well-log data due to their ability to recognize the nonlinear relationship between log responses and lithology. In this study, we employed the XGBoost classifier model to assess the parametric sensitivity of the logging features in identifying lithology. Sequential Forward Feature Selection (SFFS) approach was applied to determine the most significant set of logging features that yields maximum classification accuracy. Nine logging parameters were used as the input of the XGBoost model to classify five lithology classes. Hyper-parameters were tuned within appropriate tuning search range to optimize the model performance along with Grid Search and 5-fold cross validation. The training and testing accuracy obtained from the model are 96% and 90.97%, respectively. The optimized XGBoost model showed the lowest precision, recall, and F1 score for the clay class. The shale class, on the other side, had the highest precision, recall, and F1 score of 95%, 94%, and 94%, respectively. The best set of logging features for identifying formation lithology is determined to be [PEF, SP, LLD, CALI, MSFL, NPHI, LLS], with a classification accuracy of 92.10% for the testing dataset. The least significant logging parameters for identifying lithology were found to be DT and GR, as their addition with the best input feature set, resulted in a negative deviation in classification accuracy. Finally, based on the classification accuracy, we ranked the top three sets of logging features, those contributed significantly.
Read full abstract