Chain-based machine learning for full PVT data prediction

Kassem Ghorayeb,Hussein Mustapha,Nour El Droubi,Obeida El Jundi,Arwa Ahmed Mawlod,Alaa Maarouf,Robert Merrill,Qazi Sami

doi:10.1016/j.petrol.2021.109658

Abstract

Building machine learning (ML) models based on pressure-volume-temperature (PVT) data is of paramount importance to capture trends and predict fluid behavior in a very heterogeneous and highly nonlinear thermodynamic system. PVT samples stored in an oil company database are often not complete and might be missing properties; both black oil and compositional. Before delving into building optimized fluid models, it is required to have a clean and structured PVT database complete with all the required properties. We present multiple novel algorithms developed to accurately predict a complete set of black oil and compositional properties within a PVT database. The proposed methodology consists of predicting properties in series, starting from a minimal set of data (black oil and compositional) and obtaining a complete set of data for all PVT samples. The order through which this is completed relies on benefiting from the existing data to predict missing data starting from the highest correlating data to the lowest. We also honored the physical nature of correlations between properties, and consequently, ranked properties for prediction as this leads to less error propagation. In addition, we have implemented data clustering prior to training ML models. Clustering is used to categorize the fluid samples into families based on the collective behavior of their different features, and hence, improve the quality of the PVT samples' properties prediction using machine learning. Several options are tested where clustering is performed using black oil properties only, compositional properties, or a combined set of black oil and compositional properties and the clustering scenario leading to the least prediction error is adopted. We have trained ML models to generically predict all black oil and compositional properties resulting from laboratory experiments, including molecular weights (MW) and mole fractions of heavy fractions (any set of heavy fractions) from the mole fractions of all the commonly available components up to C7+ and molecular weight of C7+. The massive data set used in this paper enabled comprehensive testing of the developed algorithms and provided striking accuracy of the predicted PVT properties; especially the compositional ones. Despite all the significant efforts shown in the literature concerning predicting PVT properties, the missing link is a systematic methodology to complete PVT samples' properties in a consistent manner. Furthermore, the focus in the literature is mainly on forecasting black oil properties; compositional properties are scarcely considered. Algorithms developed in this paper address these two limitations and are tested using a uniquely large data set available for onshore and offshore fields and reservoirs in Abu Dhabi. No previous algorithms, to the best of our knowledge, are tested on such a large data set. • Systematic, chain-based, methodology to complete PVT samples' compositional and black oil properties in a consistent manner. • Clustering to categorize fluid samples based on their thermodynamic behavior and reveal deep insights of these categories. • Application of the proposed algorithms for compositional splitting was a powerful illustration of the proposed algorithms. • Algorithms are tested using a uniquely large data set available for onshore and offshore fields and reservoirs in Abu Dhabi.

Full Text