Infrared spectroscopy (IR) combined with multivariate calibration technology can be used as a potential method to quantitative analysis of polycyclic aromatic hydrocarbons (PAHs) in soil, which provides a rapid data support for soil risk assessment. However, IR spectrum contains lots of useless information, its predictive performance is poor. Variable selection is an effective strategy to eliminate irrelevant wavelengths and enhance predictive performance. In this study, IR combined with partial least squares (PLS) was proposed to quantify anthracene and fluoranthene in soil. In order to improve the predictive performance of the PLS calibration model, the synergy interval PLS (siPLS) method was first used for “rough selection” to select feature bands; on this basis, “fine selection” was performed to extract the feature variables. In “fine selection”, three different feature variables selection methods, such as successive projection algorithm (SPA), genetic algorithm (GA), and particle swarm optimization (PSO), were compared for their performance in extracting effective variables. The results show that the siPLS-GA calibration model receive a lowest root mean square error (RMSE) and a largest determination coefficient (R2). Results of external validation demonstrate an excellent predictive performance of siPLS-GA calibration model, with the R2 = 0.9830, RMSE = 0.5897 mg/g and R2 = 0.9849, RMSE = 0.4739 mg/g for anthracene and fluoranthene, respectively. In summary, siPLS combined with GA can accurately extract the effective information of the target substance and improve the predictive performance of the PLS calibration model based on IR spectroscopy.
Read full abstract