Abstract

Raman spectroscopy combined with artificial intelligence (AI) is widely used in medical diagnostic research and has great application value. However, there are still problems in the research process, such as the low prevalence of some diseases and difficulties in obtaining research samples, which will easily lead to data imbalance in medical Raman spectroscopy research. For AI classification and diagnosis algorithms, when the data imbalance problem is not addressed, majority class samples are selected, and the importance of minority class samples is ignored, reducing the accuracy of disease identification. Based on the above problems, this paper proposes a hybrid sampling technique of Raman-Gaussian distributed oversampling fused with random undersampling (R-GDORUS) to solve the data imbalance problem in medical Raman spectroscopy. The density and distance information carried by the minority samples are used to obtain the selection probability of the minority samples, determine the anchor samples from the minority samples, and generate a new minority sample in the form of a Gaussian distribution. Finally, a random undersampling strategy is used to remove some of the majority class spectral samples. This technique and five other mainstream methods for handling imbalanced data are applied to three major types of imbalanced medical Raman spectroscopy datasets: malignant tumors, class B infectious diseases and autoimmune diseases, and the performance of the technique is evaluated using the AUC and G-mean values. The results demonstrate that the proposed technique can be used to effectively reduce the impact of impaired model performance caused by spectral data imbalance and has good application prospects.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.