Accurately describing the evolution of water droplet size distribution in crude oil is fundamental for evaluating the water separation efficiency in dehydration systems. Enhancing the separation of an aqueous phase dispersed in a dielectric oil phase, which has a significantly lower dielectric constant than the dispersed phase, can be achieved by increasing the water droplet size through the application of an electrostatic field in the pipeline. Mathematical models, while being accurate, are computationally expensive. Herein, we introduced a constrained machine learning (ML) surrogate model developed based on a population balance model. This model serves as a practical alternative, facilitating fast and accurate predictions. The constrained ML model, utilizing an extreme gradient boosting (XGBoost) algorithm tuned with a genetic algorithm (GA), incorporates the key parameters of the electrostatic dehydration process, including droplet diameter, voltage, crude oil properties, temperature, and residence time as input variables, with the output being the number of water droplets per unit volume. Furthermore, we modified the objective function of the XGBoost algorithm by incorporating two penalty terms to ensure the model’s predictions adhere to physical principles. The constrained model demonstrated accuracy on the test set, with a mean squared error of 0.005 and a coefficient of determination of 0.998. The efficiency of the model was validated through comparison with the experimental data and the results of the population balance mathematical model. The analysis shows that the initial droplet diameter and voltage have the highest influence on the model, which aligns with the observed behaviour in the real-world process.
Read full abstract