Abstract

Abstract Dielectric is a specialized logging tool employed to measure the oil saturation independently of water salinity, enabled by the convoluted multi-frequency measurements of the formation's dielectric properties. Conventional resistivity and salinity-dependent tools are more commonly used but with high measurement uncertainty in variable salinity environments. In this paper, we developed a unique data-driven model encapsulating several supervised and unsupervised machine learning algorithms to predict the dielectric-based saturation using readily available reservoir and well data. More than 20 features were extracted from a common reservoir, well and petrophysical data such as: fluid densities, porosity, free water level, rock type, caliper and resistivity-based oil saturation. We also handcrafted features relating the sample's depth to the nearest injector and reservoir structure. The dataset, with 4000 samples, was randomly split into 80% training and 20% testing. Training was performed using cross-validation technique to benchmark and tune various machine learning regression models including: ensemble methods, neural network and support vector machines. The performance of regression models were further optimized via integrating dimensionality reduction and anomaly detection machine learning models. Ensemble methods, particularly random forest, produced the best cross-validation score compared to other regression models. When tested on unseen samples, the random forest model predicted the dielectric oil saturation with 68% correlation coefficient. Two features dimensionality reduction techniques were evaluated to improve prediction using principal component analysis (PCA) and recursive feature elimination (RFE). PCA screen-plot analysis was performed to reduce the features dimensionality to five principle components accounting for more than 80% of data variability. The RFE model was used to select seven optimal features from the total original features. The refitted random forest model, based on the reduced features from both techniques, resulted in a better prediction with the RFE model by 4%. To further improve results, the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) unsupervised model was used to detect anomalies across high dimensional features. The parameters from the DBSCAN model identified more than 400 outliers mostly located at the reservoir's top invaded and lower quality production zones. By removing these outliers, the predicted dielectric oil saturation on test samples from random forest have improved to 76% correlation coefficient. Compared to conventional logs, the dielectric tool provides salinity-independent measurement of oil saturation, but it is more expensive with elaborate physical models. In this paper, we demonstrate the unique development of fully-integrated supervised and unsupervised machine learning models for predicting the dielectric oil saturation based on the soft-computation of readily available features extracted from conventional reservoir and well sources.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call