BackgroundThe Enhanced Data Point Importance (EDPI) method, a systematic approach for evaluating the importance of data points in multivariate calibration, is introduced. Factor decomposition methods allow for the evaluation of the impact of variables on maintaining the structural pattern of data in the abstract space. Essential data points play a key role in these patterns and the method of Data Point Importance (DPI) aims to evaluate the essential data points in terms of their importance. All other points are rated by zero. In this contribution, DPI is extended to include inner points to evaluate the importance of these points in the absence of the essential points. The EDPI method employs convex peeling to sort data points systematically. ResultsEDPI method was applied to near-infrared and Raman spectroscopy data sets, including corn and alcohol mixtures and simulated data, to rank and select important variables. EDPI effectively identified variables that contributed to the preservation of the data structure and highlighted key spectral regions with different degrees of selectivity. In the alcohol dataset, EDPI revealed important physicochemical insights by focusing on specific regions where non-analytes spectra overlapped. It performed in a similar way to the Variable Importance in Projection (VIP) method, but with fewer variables selected. SignificanceThe experimental results obtained from calibrating near-infrared and Raman spectroscopic datasets using partial least squares highlight the effectiveness of the proposed EDPI strategy when contrasted with the conventional variable importance in projection (VIP) method for variable selection.
Read full abstract