Building machine learning models to identify wood species based on near-infrared spectroscopy

Li Luo,Zhao-Jun Xu,Bin Na

doi:10.1515/hf-2022-0122

Abstract

Abstract Efficient and nondestructive technology for identifying wood species facilitates the transition from digital forestry to smart forestry. While near-infrared spectroscopy applied to wood identification is well documented, the detailed mechanisms for chemometrics remain unclear. In this study, twelve wood species were identified by using near-infrared spectroscopy combined with six machine learning algorithms (support vector machine, logistic regression, naïve Bayes, k-nearest neighbors, random forest, and artificial neural network). Above all, isolated forest and local outlier factor were used to detect and exclude outliers. Then feature engineering strategies were developed from three perspectives to process feature matrices: feature selection, feature extraction, and feature selection combined with feature extraction. Next, the learning curve, grid search method, and K-fold cross-validation were used to optimize the model parameters. Finally, the accuracy, operation time, and confusion matrix were used to evaluate the model performance. When the local outlier factor was used to remove outliers and principal component analysis was used to extract features, the support-vector-machine-based wood-species identification model produced the most accurate results, with 98.24% accuracy. These results offer new avenues for constructing automatic wood-identification systems.

Full Text