Industrial data are usually collinear, which can cause pure data-driven sparse learning to deselect physically relevant variables and select collinear surrogates. In this paper, a novel two-step learning approach to retaining knowledge-informed variables (KIV) is proposed to build inferential models. The first step is an improved knowledge-informed Lasso (KILasso) algorithm by removing penalty on the KIVs to produce a series of candidate subsets that guarantee the retention of the KIVs. The candidate subsets are then used to run the KILasso or ridge regression again to select the best sets of variables and estimate the final model. Two new algorithms are proposed and applied to datasets from an industrial boiler process and the Dow Chemical challenge problem. It is demonstrated that some important physically-relevant variables are deselected by pure data-driven sparse methods, but they are retained using the proposed knowledge-informed methods with superior prediction performance.
Read full abstract