The detection of trace levels of molecular gases has gained increasing attention in many fields from atmospheric pollution and climate change monitoring to industrial safety and breath analysis for clinical diagnosis. Established techniques e.g. mass spectrometry, gas chromatography, electrochemical offer accuracy but are bulky and expensive. Apart from improving limits of detection (LOD) and increasing the number of target species, there is a major drive towards system miniaturisation and cost reduction in order to enhance field deployment e.g. for rapid continuous environmental monitoring via autonomous distributed networks or point of care clinical breath screening. Tuneable diode laser IR absorption spectroscopy (TLDAS) and atomic emission spectroscopy ICP-AES are routine laboratory spectroscopic techniques where future miniaturisation research includes, for example, microwave, photoacoustic-MEMS, broadband tuneable quantum cascade lasers or supercontinuum IR lasers, high sensitivity nanomaterials, among others. The advent of non-equilibrium low-temperature (NELT) atmospheric pressure plasmas opens up the possibility of using plasma optical emission spectroscopy (OES) for portable and/or low-cost detection in a diverse range of applications. However the poor quality of the spectra from these plasmas requires additional machine learning techniques to develop accurate models based on optical emission training samples. Our original work involving unsupervised principal component analysis indicated significant cluster separation even at sub-ppm levels of e.g. NO impurity gases. Recently we have investigated supervised learning using Partial Least Squares Discriminant Analysis (PLS-DA) using a dataset of He-CH4 spectra where the CH4 concentration varies from 0 – 100 ppm. Methane is a important hydrocarbon gas found in a number of fields from breath analysis to natural gas production and research is ongoing into accurate environmental CH4 detectors in the ppm range.The spectra were obtained from a small RF capillary plasma, 0.7mm in diameter and 5mm long. Spectra were obtained using a Ocean Optics HR4000CG-UV-NIR spectrometer in the wavelength range 194 – 1122 nm (interval 0.25 nm) Data is collected across 3648 variables (wavelengths) with up to 720 samples per dataset, and up to 9 CH4 concentration categories (0 to 100 ppm). Spectral features corresponding to He, hydrogen, carbon-related and impurities (N2, O2, OH/H2O) were observed. No peak can be assigned unambiguously to any particular species. The dominant peaks for 100% He (587.95nm, 707.08nm) varied by ~26% (std. dev). On introduction of CH4, their intensity remained constant (within 1 std. dev) up to ~23 ppm, falling thereafter. In these small atmospheric plasmas, the gas temperature remains cold and hydrocarbon fragmentation is thought to be minimal. Minor peaks indicative of possible CH and C2 fragments were observed but mainly at the higher concentrations. Atomic hydrogen is a possible CH4 fragment, although a contribution from H2O impurity fragmentation is also likely. The other impurity peaks, although of low intensity, have been found to play an important role in algorithm recognition.Predictive models were generated using PLS-DA with two general approaches explored, namely (i) 2-class and (ii) 8-class. In the former, for a threshold of 2 ppm CH4, the model accuracy was > 95% with < 10 latent variables (LV). In the 8-class model, the initial accuracy struggled to reach 60%, despite a range of standard pre-processing approaches employed. Our OES spectra represent high dimensionality collinear data with inherent pattern variability, temporal drift and low resolution as a penalty for the simple low-cost construction. This presents a serious challenge to developing robust machine learning algorithms. Variable Importance in Projection (VIP) outputs the relative significance of each wavelength variable to model’s predictive accuracy and provides insight into how algorithms function and their sensitivity to OES features and plasma chemistry. Using VIP-informed wavelength feature selection and peak modification we obtain high predictive accuracy (>90%) for separate training and test session data with a LOD of 1 ppm. Analysis shows that of the 12 most significant peaks, 6 are due to impurity transitions while the three possible CH4 fragment related transitions are noticeable only at high CH4 concentration. While high temperature plasma techniques (e.g. ICP) rely on molecular fragmentation, this is unlikely to occur in NELT plasmas due to the low energy density. We believe instead, the introduction of the target molecule into the plasma impacts on the low energy side of the electron energy distribution function, affecting electron density and temperature, which in turn is visible in the excitation and vibrational/rotational transitions of carrier and impurity gases. The development of more sophisticated machine learning algorithms capable of handling spectral data from greater complexity target mixtures will provide both an insight into and progress in tandem with improved understanding of NELT plasma physics and chemistry parameters. Figure 1
Read full abstract