A novel two‐phase near‐infrared and midinfrared wavelength selection framework for sample classification

Juliana Fontes,Guilherme B Bucco,João B G Brito,Michel J Anzanello

doi:10.1002/cem.3536

Abstract

AbstractSpectral data describing product samples are typically composed of a large number of noisy and irrelevant wavelengths that tends to undermine the performance of multivariate predictive techniques. This paper proposes a two‐phase framework that integrates a preselection wavelength step oriented by wavelength clustering to a wrapper‐based strategy. The first phase performs a pruning process in the data that removes the less informative wavelengths relying on the spectral clustering, a technique deemed suitable to the Fourier transform infrared (FTIR) spectroscopy and near‐infrared (NIR) spectroscopy data at hand. The preselected wavelengths undergo a second phase of selection efforts based on the combination of different wavelength importance indices (i.e., Bhattacharyya distance, Chi‐square, ReliefF, and Gini) and classification techniques (i.e., support vector machine, k‐nearest neighbors, and random forest). When applied to 11 FTIR datasets from different domains, the recommended combination of importance index and classifier increased the average accuracy by 6.37% (from 0.863 to 0.918), while retaining average 3.84% of the original spectra. The framework also improved the selection process regarding computational time.

Full Text