The ultimate analysis parameters, including carbon (C), hydrogen (H), nitrogen (N), and oxygen (O) content in biomass, were rarely found to be predicted by non-destructive tests to date. In this research, we developed partial least squares regression (PLSR) models to predict the ultimate analysis parameters of chip biomass using near-infrared (NIR) raw spectra of non-wood and wood samples from fast-growing tree and agricultural residue and nine different traditional spectral preprocessing techniques. These techniques include first derivative (sd1), second derivative (sd2), constant offset, standard normal variate (SNV), multiplicative scatter correction (MSC), vector normalization, min-max normalization, mean centering, sd1 + vector normalization, and sd1 + MSC. Additionally, we employed a genetic algorithm (GA), successive projection algorithm (SPA), multi-preprocessing (MP) 5-range, and MP 3-range to develop a PLSR model for rapid prediction. A dataset consisting of 120 chip biomass samples was utilized for model development in which the samples were non-wood samples of 65–67% and wood samples of 33–35%, and the model performance was evaluated and compared. The selection of the optimum performing model was mainly based on criteria such as the coefficient of determination in the prediction set (R2P), root mean square error of the prediction set (RMSEP), and the ratio of prediction to deviation (RPD). The optimal model for weight percentage (wt.%) of C was obtained using GA–PLSR, yielding R2P, RMSEP, and RPD values of 0.6954, 1.1252 wt.%, and 1.8, respectively. Similarly, for wt.% of O, the most effective model was obtained using the multi-preprocessing PLSR–5 range method with R2P of 0.7150, RMSEP of 1.3088 wt.%, and RPD of 1.9. For wt.% of N, the optimal model was obtained using the MP PLSR-3 range method, resulting in R2P, RMSEP, and RPD values of 0.6073, 0.1008 wt.%, and 1.6, respectively. However, wt.% of the H model provided R2P, RMSEP, and RPD values of 0.5162, 0.2322 wt.%, and 1.5, respectively. Notably, the limit of quantification (LOQ) values for C, H, and O were lower than the minimum reference values used during model development, indicating a high level of sensitivity. However, the LOQ for N exceeded the minimum reference value, implying the samples to be predicted by the model must be in the range of reference range in the calibration set. By scatter plot analysis, the effect of combined non-wood and wood spectra of biomass chips on rapid prediction of ultimate analysis parameters using NIR spectroscopy was investigated. To include different species in a model, the species have to be not only in the different values of the constituents to make a wider range for a robust model, but also must provide their trend line characteristics in the scatter plot, i.e., correlation coefficient (R), slope, and intercept (same slope and slope approached to 1, and intercept is same (no gap) and approached zero, high R approached to 1). The effect of the R, slope, and intercept to obtain the better-optimized model was studied. The results show that the different species affected the model performance of each parameter prediction in a different manner, and by scatter plot analysis, which of these species were affecting the model negatively and how the model could be improved was indicated. This is the first time the effect has been studied by the principle of a scatter plot.
Read full abstract