Hydrogen solubility in n-alkanes: Data mining and modelling with machine learning approach

Afshin Tatar,Zohre Esmaeili-Jaghdan,Amin Shokrollahi,Abbas Zeinijahromi

doi:10.1016/j.ijhydene.2022.08.195

Abstract

Hydrogen solubility in hydrocarbons plays an important role in designing, optimizing, and modelling many processes including underground hydrogen storage. This study applies four machine learning techniques (Decision Tree (DT), Random Forest (RF), Gradient Boosting (GB), and Extremely Randomized Trees (ET)) to predict the hydrogen solubility in n-alkanes. A comprehensive dataset of almost all available experimental data has been collected from available literature for model tuning and sensitivity analysis. The dataset includes 1845 data samples for 15 n-alkanes (C1 to C46) gathered from 29 experimental studies. Statistical analysis was conducted for data cleaning and pre-processing. The feature selection analysis showed that pressure (P), dimensionless pressure (PD), dimensionless temperature (TD), and critical pressure (PC) have the most effect on hydrogen solubility which is in agreement with the current available thermodynamic models. The modelling results showed that the ensemble methods have higher accuracy than the DT model, with GB being the best predictor. The GB model showed high accuracy for both training and testing datasets: RMSE and R2 values of 0.0086 and 0.9826, respectively, for the testing dataset. All models also were evaluated using a blind dataset-that was not included in testing or training-to confirm model applicability to wider data. Similarly, all ensembled models performed excellently for the external dataset, where the RF model showed the best performance (RMSE and R2 values of respective 0.0050 and 0.9755). The findings of this study can help for a better understanding of hydrogen solubility in n-alkanes and consequently petroleum and can be used for applications of underground hydrogen storage.

Full Text