Evaluation of Different Machine Learning Frameworks to Estimate CO2 Solubility in NaCl Brines: Implications for CO2 Injection into Low-Salinity Formations

Erfan Mohammadian,Amin Riazi,Bo Liu,Jingwei Huang

doi:10.2113/2022/1615832

Erfan Mohammadian, Amin Riazi + Show 2 more

Open Access

https://doi.org/10.2113/2022/1615832

Copy DOI

Abstract

Abstract An accurate estimation of carbon dioxide (CO2) solubility in brine is of great significance for industrial applications such as quantifying CO2 sequestration in subsurface formations, CO2 surface mixing, and different CO2-based enhanced recovery methods (EOR). In this research, four different data-driven/machine learning techniques, extreme gradient boosting (XGB), multilayer perceptron (MLP), K-nearest neighbor (KNN), and in-house genetic algorithm (GA), were used to estimate solubility in terms of pressure, temperature, and salinity. Pressure, temperature, and salinity were used as model inputs, while CO2 solubility was the output. The experimental database used in this study was collected by dissolving CO2 into NaCl brines at salinity ranging from 0 to 15000 ppm, temperature ranging from 298 to 373 K, and pressures up to 200 atm. All data-driven models accurately estimated solubility through a coefficient of correlation (R2) ranging from 0.95 to 0.99, and a precise simple-to-use empirical solubility equation was developed using GA. The performance of the models was analyzed using proper model metrics (such as mean absolute error and relative error). A detailed feature importance analysis was conducted using feature importance, permutation, and Shapley values to clarify the correlation between the input and output parameters. The pressure was found to be the most impactful feature, followed by temperature and salinity. The model’s accuracy was compared to a well-established solubility model from the literature, and a good agreement between the two models’ results was observed. Lastly, conducting sensitivity analysis on the model revealed that the model’s estimations were still accurate when pressure and salinity were beyond the scopes of the original dataset.

Full Text