Surface enhanced Raman spectroscopy (SERS) is a subfield of Raman spectroscopy where trace compounds can be detected with the help of plasmonic materials. They enhance the Raman signal from molecules in proximity. Our work exploits the low number of molecules with enhanced signal by taking sequences of short integration spectra. This enable us to see events where a Raman active molecule shortly adsorb to the surface in an enhancing spot and gives us strong, but anisotropic signal. Because of this, conventional classification algorithms are not suited for efficient classification.The novel data analysis method presented here is based on active spectrum detection, where spectra are kept based on how many Raman peaks they are composed of. The end goal is the characterization and quantification of common drugs and cutting agents in a short time frame. A reusable microfluidic SERS sensor is used in a batch flow way, such as a technician could inject a diluted drug sample in it, analyse it for a few minutes and get result on the screen.The current architecture of the classification of the spectra can be represented with the flowchart as seen on figure 1, where the preprocessing and active spectrum detection are both classical algorithms, in the sense that they are not being optimized by training like machine learning ones, and simply apply the same rules for each and every spectrum. This preprocessing exploits a Savitzky-Golay smoothing, a baseline generation with airPLS, and a peak finding function to select valid peaks and count them and determine if a spectrum is active or not. Afterwards, the active spectra are processed with a multiclass CNN.To create a streamlined quantification pipeline, two quantification algorithms are compared with drug spectra: a CNN that is processing “stitchings” of multiple random spectra and a custom network based on modular neural networks. Both are also compared with various active spectrum detection and classifier methods.The current active spectrum detection is based on peak detection, with an arbitrary threshold set in a manner that if a peak is over its height, it is an active peak, and a spectrum would need an arbitrary number of active peaks to be considered active. This method can sort good spectra from the inactive and weakly active one, but the weak one can be filled with information that could be processed by the classification network. The proposed quantification pipeline forces the next network to be dependent of the previous ones, and such, parallel optimization is possible during training. By setting a quality score based on how active the spectra are, we can let the network optimize the trade-off between how many spectra will be considered for classification and the accuracy of the classification, because weakly active spectra can lower the accuracy when they don’t have enough information to be correctly classified.As mentioned earlier, a multiclass CNN is used. This type of classifier uses a All vs. All (AVA) strategy. This implies that all classes are tested against each other, and that one of the classes must be chosen. In the case of physiological spectra classification, this is problematic. The active spectrum of an unknown molecule would reach the classifier and be forced into one of the learned classes, and the result would be inherently false.To avoid this problem, two different type of classification architecture are tested with the quantification network: 1. An ensemble of binary CNN classifiers will be tested with a One vs. All (OVA) strategy, where there are only 2 outcomes for each classifier.2. An ensemble of one-class classifiers (OCC) where the outcome possible is either Yes or No for one class each. They might look the same at first hand but the difference is vast. This is also called Concept Learning and Anomaly Detection. Figure 1
Read full abstract