In Gaussian Process, feature importance is inversely proportional to the corresponding length scale when applying the Automatic Relevance Determination (ARD) structured kernel function. Features can be selected by ranking them according to their importance. Among the ARD-based feature selection methods, no uniform score exists for quantifying the output variation explained by feature subsets. This study proposes two feature selection approaches using two cumulative feature importance scores, one titled derivative decomposition ratio and the other normalized sensitivity, to determine the optimal feature subset. The performance of the approaches is assessed to test if irrelevant features are accurately identified and if the feature rankings are correct. The approaches are applied to identify relevant dimensionless inputs for a hybrid model estimating liquid entrainment fraction in two-phase flow. The results reveal that the proposed methods can identify the optimal feature subset for the hybrid model without significantly worsening its Root Mean Squared Error.
Read full abstract