The combination of deep learning techniques and Raman spectroscopy shows great potential offering precise and prompt identification of pathogenic bacteria in clinical settings. However, the traditional closed-set classification approaches assume that all test samples belong to one of the known pathogens, and their applicability is limited since the clinical environment is inherently unpredictable and dynamic, unknown, or emerging pathogens may not be included in the available catalogs. We demonstrate that the current state-of-the-art neural networks identifying pathogens through Raman spectra are vulnerable to unknown inputs, resulting in an uncontrollable false positive rate. To address this issue, first we developed an ensemble of ResNet architectures combined with the attention mechanism that achieves a 30-isolate accuracy of 87.8 ± 0.1%. Second, through the integration of feature regularization by the Objectosphere loss function, our model both achieves high accuracy in identifying known pathogens from the catalog and effectively separates unknown samples drastically reducing the false positive rate. Finally, the proposed feature regularization method during training significantly enhances the performance of out-of-distribution detectors during the inference phase improving the reliability of the detection of unknown classes. Our algorithm for Raman spectroscopy empowers the identification of previously unknown, uncataloged, and emerging pathogens ensuring adaptability to future pathogens that may surface. Moreover, it can be extended to enhance open-set medical image classification, bolstering its reliability in dynamic operational settings.
Read full abstract