Abstract Knowledge in the characterization and evaluation of hydrocarbon formations by identifying the type of lithology is an important step in oil and gas exploration. However, this method is not efficient enough because it requires a lot of time and effort. This study focuses on lithology predictions using log data input by applying one of the machine learning algorithms, namely K-Nearest Neighbors (KNN). This KNN algorithm can classify data based on K values to predict lithology. The Hyperparameter K value is determined based on the number of nearest neighbors calculated using the Euclidean distance principle. The K value in KNN needs to be optimized in order to find out the best K value to get high accuracy. The accuracy value of the model is obtained by comparing the results of the actual qualitative interpretation of lithology with the prediction results of lithology with KNN resulting precision, recall and F1 as representative of model accuracy. In the data train wells KP-24, KP-54 and KP-57 as well as test data, namely the KP-34 well, the value of k = 5 has the highest accuracy value of 89.9%. With the dominance of lithology in the predicted results, namely limestone, according to the information on the well data area, namely the Cibulakan Formation above and into the well, it reaches the Baturaja Formation. These results indicate that the lithology predicted by the KNN algorithm from the train data is already representative of the actual lithology data. Comparison of the results of qualitative interpretation with the results of machine learning predictions on the KP-34 well can predict thin or small layers and data labels in a relatively short time.
Read full abstract