Microplastics (MPLs) are ubiquitous particles derived from degradation from plastic refuse or directly from anthropogenic sources. These particles are present in aquatic environments with potential toxicology across the biosphere. In order to characterize their presence, the Neuse River basin was selected for sampling. However, like many MPLs, spectroscopic characterization can be sullied by weathering processes that obscure the native spectra. Therefore, to enhance MPL identification, Principal Component Analysis (PCA), K-means Clustering (KMC), MATLAB, and SAS Viya’s machine learning (ML) modules were implemented on the Neuse River basin's samples. 18 unknown samples were recorded with attenuated total reflectance-Fourier transform infrared (ATR-FTIR) spectroscopy against the controls for PCA and KMC: high-density polyethylene (HDPE), low-density polyethylene (LDPE), polypropylene (PP), polystyrene (PS), polyvinyl chloride (PVC), polyethylene terephthalate (PET), and nylon-6 (PA6). Later on, these unknowns were tested against a new set of 900 commercial control samples partitioned by 9 classes for MATLAB and SAS Viya: cellulose acetate (CA), HDPE, LDPE, PP, PS, PVC, PET, polymethyl(methacrylate) (PMMA), and PA6 using µ-FTIR (micro-FTIR) for single-particle discrepancy. Application of MATLAB’s Classification Learner and SAS Viya for Learners software helped aggregate multiple machine learning (ML) models to test for corroboration of predictions with datasets derived from Version 1 (V1), Version 2 (V2), and Version 3 (V3) feature extraction algorithms. Novel feature extractions, based on regressions of discrete data from spectral intensities, in V2 and unique probing of the relationships informing the regressions in V3 showed moderate to moderately high predictor strength, contributing to accuracy increase in V3 according to various predictor strength algorithms in MATLAB. In the test scenario of the strongest version, V3, 63.2% (+ 5.3% from V1) of the models performed very strongly (90% cutoff in accuracy), and 89.5% (+ 0% from V1) of the models performed moderately strongly (80% cutoff in accuracy). The models, coupled with the unsupervised PCA and KMC, indicated microparticle (MP) from Stockinghead Creek in Duplin County, NC; SHR-1b(2), as LDPE. However, the change in corroboration across model types was not significant, according to Kruskal-Wallis (H = 0.555, p = .7577). An ANOVA was performed to see if LDPE incidence was similar across all unknowns, indicating a similar ratio of predictions, excluding NRCP-1 (F = 31.479, p < .0001). While it is unclear if this particle is truly LDPE, the results may suggest that LDPE could be of high presence in the river basin (specifically, PE) when taking into account the predictor set across all versions. Lastly, misclassification was further elucidated for samples “CF-3” and “NRCP-1(2)” which are thought to be extensively-degraded PVC MPLs.
Read full abstract