Applications of Data Characteristic AI-Assisted Raman Spectroscopy in Pathological Classification.

Xun Chen,Jianghao Shen,Chang Liu,Xiaoyu Shi,Weichen Feng,Hongyi Sun,Weifeng Zhang,Shengpai Zhang,Yuqing Jiao,Jing Chen,Kun Hao,Qi Gao,Yitong Li,Weili Hong,Pu Wang,Limin Feng,Shuhua Yue

doi:10.1021/acs.analchem.3c04930

Abstract

Raman spectroscopy has been widely used for label-free biomolecular analysis of cells and tissues for pathological diagnosis in vitro and in vivo. AI technology facilitates disease diagnosis based on Raman spectroscopy, including machine learning (PCA and SVM), manifold learning (UMAP), and deep learning (ResNet and AlexNet). However, it is not clear how to optimize the appropriate AI classification model for different types of Raman spectral data. Here, we selected five representative Raman spectral data sets, including endometrial carcinoma, hepatoma extracellular vesicles, bacteria, melanoma cell, diabetic skin, with different characteristics regarding sample size, spectral data size, Raman shift range, tissue sites, Kullback-Leibler (KL) divergence, and significant Raman shifts (i.e., wavenumbers with significant differences between groups), to explore the performance of different AI models (e.g., PCA-SVM, SVM, UMAP-SVM, ResNet or AlexNet). For data set of large spectral data size, Resnet performed better than PCA-SVM and UMAP. By building data characteristic-assisted AI classification model, we optimized the network parameters (e.g., principal components, activation function, and loss function) of AI model based on data size and KL divergence etc. The accuracy improved from 85.1 to 94.6% for endometrial carcinoma grading, from 77.1 to 90.7% for hepatoma extracellular vesicles detection, from 89.3 to 99.7% for melanoma cell detection, from 88.1 to 97.9% for bacterial identification, from 53.7 to 85.5% for diabetic skin screening, and mean time expense of 5 s.

Full Text