A chemistry-based explainable machine learning model based on NIR spectra for predicting wood properties and understanding wavelength selection

Laurence Schimleck,Samuel Ayanleye,Stavros Avramidis,Vahid Nasir

doi:10.1080/17480272.2023.2265349

Laurence Schimleck, Samuel Ayanleye + Show 2 more

https://doi.org/10.1080/17480272.2023.2265349

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

ABSTRACT A chemistry-based explainable machine learning (ML) approach was used to predict wood properties using near infrared (NIR) spectral data collected from rough and smooth surfaces, and to provide better understanding of the role of important NIR wavelengths (features) in the performance of ML models. NIR spectra collected from western hemlock (Tsuga heterophylla) and coastal Douglas-fir (Pseudotsuga menziesii) boards with rough and smooth surfaces were fed into random forest and TreeNet; a gradient boosting machine algorithm, for predicting wood density, modulus of elasticity (MOE) and modulus of rupture (MOR). The TreeNet model could predict the MOE, MOR, and density with R2 of 0.66, 0.64, and 0.64 using spectra collected from rough surface and R2 of 0.54, 0.46, and 0.46 using spectra collected from smooth surface. TreeNet outperformed the random forest, and for both ML algorithms higher R2 and lower error were obtained using NIR data collected from rough surfaces. This suggested that for Douglass fir and western hemlock, NIR spectra could be collected on a sawn surface prior to surface planing. However, it is difficult to generalize the impact of surface roughness on the performance of predictive model as different factors (e.g. what constitutes a smooth or rough surface, variability of data set in terms of wood properties) impact the success of predictive models. NIR features having the greatest influence on TreeNet models were examined and consistently had wood chemistry specific band assignments. The most important features occurred in the O-H first overtone, and C–H second overtone regions and a narrow zone (approximately 2400–2500 nm) of the C–H stretch C–C stretch combination region. Important features also differed by property and with surface roughness. Explaining ML model performance using the relative importance of the NIR features showed the importance of wood chemistry related information when developing models, however MOE and MOR TreeNet models based on smooth surface NIR spectra showed an increased importance of water related features. Overall, the chemistry-based explainable machine learning model approach allows for identification of important NIR features, and regions, and aids in understanding how they contribute to the performance of NIR-based wood property predictive models.

Full Text