Abstract

This study aims to estimate recovery factor (RF), a key property for exploration, from other reservoir characteristics, such as porosity, permeability, pressure, and water saturation via machine learning (ML). The database dependence of ML algorithms in the estimation of the hydrocarbon RF at the reservoir scale, however, has not yet been addressed. We, therefore, used various combinations of three databases and applied three regression-based models including the extreme gradient boosting (XGBoost), support vector machine (SVM), and stepwise multiple linear regression (MLR) to construct the ML models and estimate the oil and/or gas RF. Using two databases and the cross-validation method, we evaluated the performance of the ML models. The third independent database was then used to further assess the constructed models. We found that the XGBoost model estimated the oil and gas RF for the train and test datasets more accurately than the SVM and MLR models. In the estimation of oil RF and for the testing dataset in the largest database, we found RMSE = 0.111 for the XGBoost model, while RMSE = 0.130 and 0.134, respectively, for the SVM and MLR models. However, the performance of all the models were unsatisfactory for the independent databases. Results demonstrated that the ML algorithms were highly dependent and sensitive to the databases based on which they were trained. Statistical tests revealed that such unsatisfactory performances were because the distributions of input features and target variables in the train datasets were significantly different from those in the independent databases (p-value <0.05).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call