Abstract

In the process of collecting chemical data, missing values often occur due to some reasons. The missing values will destroy the multi-linear structure of the data, thus making the traditional multi-way calibration algorithm unable. Therefore, this work proposed a novel second-order calibration algorithm, PARAFAC for missing values (PARAFACM), which can directly handle the three-way data array with missing values. This proposed algorithm combines the idea of regarding multivariate linear regression as a set of individual univariate linear regressions with the iterative least squares principle. PARAFACM is open source and available from Supplementary materials. The proposed algorithm, as well as the previous three algorithms, including weighted PARAFAC (WPARAFAC), Incomplete Data PARAFAC (INDAFAC), and PARAFAC with single imputation (PARAFACSI), were used to handle the three-way simulation data array with three missing situations (random missing points, random missing channels, and scattering removal) at different missing ratios and noise levels. The comparison results show that the proposed algorithm is more universal when dealing with three missing situations and has a faster iteration speed. Moreover, its overall performance is better than the other algorithms when dealing with different missing ratios and noise levels. In addition, the proposed algorithm was also used to decompose the semi-simulated HPLC-DAD data array with missing time channels and the real EEM data array with scattering removal, and obtained satisfactory quantitative results. In general, the proposed algorithm can efficiently and accurately handle a variety of three-way data arrays with missing values, and extract the qualitative and quantitative information of components of interest in the presence of unknown interferences.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call