Facies collected from wells drilled in the study area were interpreted manually by using cores at every 10 meters of depth during well drilling. This depth of cores does not give true facies of all wells because the cores every 10 meters are considered very large. Extracting cores is financially expensive and takes a long time. The methodology of machine learning consists of four steps (Data gathering, Data preprocessing, Model training, and Model evaluation). This work intends to apply two of the supervised machine learning techniques random forest and decision tree models. (1) Data gathering, this dataset was collected from the Basrah Oil Company. It contains of ten wells (B-3, B-4, B-5, B-15, B-17, B-18, B-19, B-34, B-39, and B-40). Every well contains six features (logs): Sonic log, Resistivity Deep, Micro Spherically Focused Log, Neutron porosity, Density log, and Gamma-ray. Also, the dataset contains ten facies labels: mudstone, wack stone, packstone, roundstone, floatstone, shale, mud and wack stone, wack and pack stone, pack and grain stone and pack and float stone. These logs cover all the thickness of the Mishrif Formation, which is the goal of our study. (2) Preprocess, the data must be cleaned of outlier values; these values reduce the accuracy of the model during training. It is necessary at this stage to understand the relationships between all the features, because the highly correlated relationship between any two features, the more useless it will be in machine learning. The well B-5 blinded it for training to demonstrate the ability of machine learning models to predict lithofacies, then splitting randomly the dataset into 70% for training, validation 10%, and 20% for testing to verify the performance. (3) Training two models machine learning models decision tree and random forest. (4) Four statistics are computed for two models from the confusion matrix (accuracy of classification, recall, precision, and F1-score) showing the random forest was more accurate than the decision tree because the random forest model deals very well with this amount of dataset. Receiver Operating Characteristics curves of the random forest model have obtained the largest Area Under the Curve than the decision tree, is positive and above the main diagonal for all lithotypes, and the values for all classes reached more than 95%, except class 6 (shale), because of the 100% accuracy of classification. Facies classification by machine learning approach has two benefits (1) Increased accuracy of describing oil reservoirs and (2) Reducing the time, which a geologist needs to interpret logs data.
Read full abstract