Given the growing urge for plastic management and regulation in the world, recent studies have investigated the problem of plastic material identification for correct classification and disposal. Recent works have shown the potential of machine learning techniques for successful microplastics classification using Raman signals. Classification techniques from the machine learning area allow the identification of the type of microplastic from optical signals based on Raman spectroscopy. In this paper, we investigate the impact of high-frequency noise on the performance of related classification tasks. It is well-known that classification based on Raman is highly dependent on peak visibility, but it is also known that signal smoothing is a common step in the pre-processing of the measured signals. This raises a potential trade-off between high-frequency noise and peak preservation that depends on user-defined parameters. The results obtained in this work suggest that a linear discriminant analysis model cannot generalize properly in the presence of noisy signals, whereas an error-correcting output codes model is better suited to account for inherent noise. Moreover, principal components analysis (PCA) can become a must-do step for robust classification models, given its simplicity and natural smoothing capabilities. Our study on the high-frequency noise, the possible trade-off between pre-processing the high-frequency noise and the peak visibility, and the use of PCA as a noise reduction technique in addition to its dimensionality reduction functionality are the fundamental aspects of this work.
Read full abstract