Abstract

The paper introduces two new methods, namely the cross correlation method (CCM) and two-dimensional correlation method (TDCM), for preprocessing Raman spectroscopy data for analyzing Chinese handmade paper samples. CCM expands the spectral dimension from 1×N to 1×2N-1 by taking cross-correlation between two spectral data of the same category. TDCM includes two-dimensional synchronous correlation method (TDSCM) and two-dimensional asynchronous correlation method (TDACM), which expand the spectral dimension from 1×N to N×N by taking tensor products between two spectral data and between one spectral data and the Hilbert transformation of the other spectral data of the same category, respectively. The experimental data were preprocessed using baseline removal, CCM, TDSCM, and TDACM methods. Four machine learning models were employed to evaluate the effects of these methods: principal component analysis (PCA) combined with linear regression (LR), support vector machine (SVM) combined with LR, k-Nearest Neighbors (KNN), and random forest (RF). The results show that the R-squared values for the PCA model were nearly 1 for all types of data, indicating high accuracy. However, for SVM-LR, KNN, and RF models, the R-squared values were sorted in the order of raw data, baseline removal data, CCM, TDSCM, and TDACM preprocessed data. The R-squared values of KNN and RF machine learning models for TDACM preprocessed data were approaching 1, indicating that the accuracy of machine learning was significantly improved by nearly 100%. This has led to a remarkable improvement in the accuracy of supervised models such as KNN and RF, bringing them closer to the level of unsupervised models such as PCA.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call