Coal geographical origin information provides powerful proof for evaluating coal quality and boosting the inspections of import and export coal. Since traditional identification methods depend on a series of instruments which induce considerable time and economic cost, it is crucial to employ an efficient and effective method to discriminate coal origin. As a rapid, non-destructive, and reagent-free analytical method, near-infrared spectroscopy (NIRS) enables rapid qualitative and quantitative characterization of a wide variety of materials, such as food and fuel. In this work, different multivariate data analysis approaches based on NIRS data are investigated to identify the geographical origin of coal. Considering that raw spectra are in high-dimensional spaces, principle component analysis (PCA), isometric mapping (Isomap) and linear discriminant analysis (LDA) are introduced to extract features. However, given the classes with similar centers, it is difficult to separate them via the standard LDA because it overvalues the contributions of the edge classes to the between-class scatter matrix. To address this concern, we propose an improved LDA (iLDA) in consideration of the contribution of each class, and enhance the impact of the classes with similar centers. In addition, we combine PCA and iLDA to solve the small sample size problem. The experimental results show that nonlinear approaches outperform linear approaches generally, and kernel partial least squares discriminant analysis combined with PCA-iLDA provides the best performance with the accuracy of 97.21%. The obtained results indicate that NIRS in tandem with different machine learning algorithms are promising for the rapid and accurate identification of coal geographical origin.
Read full abstract