Data generated from spectroscopy may be deformed by artefacts due to a range of physical, chemical and environmental factors that are not of interest for the characterization of the samples under study. For example, data acquired by near-infrared (NIR) spectroscopy in the diffuse reflectance mode can be affected by light scattering. This artefact, if not reduced or removed by spectral pre-processing, can complicate the multivariate data analysis. However, different pre-processing approaches correct these effects in different ways. For example, differentiation can reveal underlying bands, while spectral normalization techniques such as standard normal variate (SNV) can correct for multiplicative and additive effects. Combining multiple pre-processing techniques can lead to better results. However, it is not feasible for a user to explore all possible combinations of pre-processing techniques. In the present work, a new pre-processing fusion approach, based on the framework of separating common and distinct components in multi-block multivariate data analysis, is demonstrated. The approach utilizes parallel and orthogonalized partial least squares (PO-PLS) regression for the parallel fusion of multiple pre-processing techniques applied to the same data. The results obtained on 4 different NIR spectroscopic data sets related to the assessment of fruit quality and used as benchmark are compared to those of the recently developed sequential pre-processing through orthogonalization (SPORT) approach: it is found that, in all the cases, the PO-PLS approach leads to slightly better performances. Furthermore, a clear understanding of the common and distinct information present in the data sets after each pre-treatment was obtained. Parallel pre-processing through orthogonalization (PORTO) can be seen as parallel boosting of multiple pre-processing techniques to improve model performances.
Read full abstract