Abstract

Variable selection is an important preprocessing step in the development of effective data-driven models for CO2 flow measurement in carbon capture and storage systems. In order to effectively quantify the importance of potential input variables to the desired output, ensemble learning is proposed and incorporated into variable selection methodology. This paper presents a tree-based heterogeneous ensemble approach to variable selection and its application to gas-liquid two-phase CO2 flow measurement. The importance of each variable is determined through combining the importance scores from four tree-based algorithms, including decision tree regression, bootstrap aggregating of regression trees, gradient boosting decision tree and gradient boosting random forest. Then the backward elimination algorithm is applied to remove the relatively less important variables and hence a small set of input variables for data-driven models. The selection results demonstrate that the significant variables for CO2 mass flow measurement include apparent mass flow rate, time shift, differential pressure and pressure drop while observed density, density drop, observed flow velocity and outlet temperature for prediction of gas volume fraction. To assess the validity of the selected variables, data-driven models based on gradient boosting random forest are developed. Results suggest that the relative error of the model output is mostly within 1% for CO2 mass flowrate measurement and 5% for gas volume fraction prediction by taking the selected variables as model inputs.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call