In recent years, there has been a significant increase in oil exploration and exploitation activities, resulting in spills that pose a severe threat to the environment and public health. The present work aims to develop a method to detect and classify hydrocarbon-contaminated soils that is useful for analyzing contaminated sites. The method combines machine learning algorithms with data obtained via the laser-induced breakdown spectroscopy (LIBS) technique. The first stage involved optimizing the experimental parameters of the LIBS technique from eleven soil samples contaminated with different hydrocarbons and one sample used for control purposes. To classify the samples effectively, a robust and interpretable method was required. Linear discriminant analysis (LDA) was chosen for its ability to identify the linear combination of features that best separates classes while maintaining simplicity and interpretability. To address overfitting risks and reduce dimensionality, principal component analysis (PCA) was applied before LDA. This preprocessing step optimized the classification of samples contaminated with eleven different hydrocarbon sources and distinguished them from the control class. The results revealed accuracies greater than 90%. The model was also used to discriminate subsets of classes that shared similarities, which could be revealed from the analysis of the entire class set. The approach also successfully classified related classes, such as gasoline and different oils, achieving 100% accuracy in all cases. This enhanced capacity to identify and differentiate hydrocarbons with LIBS and machine learning marks a significant advancement in environmental monitoring.
Read full abstract