Application of spectral small-sample data combined with a method of spectral data augmentation fusion (SDA-Fusion) in cancer diagnosis

Xudan Zhang,Hongyi Li,Chen Chen,Cheng Chen,Xuecong Tian,Ying Su,Min Li,Jianying Lv,Xiaoyi Lv

doi:10.1016/j.chemolab.2022.104681

Abstract

BackgroundCancer is one of the most life-threatening diseases to human life, whose accurate diagnosis is the prerequisite for precise treatment. The detection technology with computer-aided vibrational spectroscopy has achieved gratifying results in intelligent cancer diagnosis. However, limited by factors such as the number of cancer instances in clinical practice and the cost of spectral acquisition, it is difficult to obtain a large amount of spectral data, which ultimately puts constraints on the performance optimization and improvement of diagnostic models. MethodFaced with the above challenges, we adopted the different data augmentation strategies in this study to obtain more available training data. In addition to the augmentation methods commonly used in vibrational spectroscopy, such as adding random noise, adding random variations from offset, multiplication and slope, and synthetic minority over-sampling technique (SMOTE), two generative adversarial networks with different architectures were selected for comparison. One is based on artificial neural networks (ANN) and the other on convolutional neural networks (CNN). In the experiments, t-distributed stochastic neighbor embedding (t-SNE) visualization and cosine similarity (CS) measure were opted to evaluate the quality of generated new spectra. New spectra with different manifestations were produced by dissimilar augmentation tactics. Effective merging of heterogeneous data information generated by different augmentation techniques can further enlarge the sample space and increase the diversity of samples. With these factors in mind, we proposed a new spectral data augmentation fusion (SDA-Fusion) method to acquire more available instances. This method is carried out by fusing the new data generated by the five different data augmentation techniques mentioned before. Finally, three groups of experiments, with the original training data, the augmented training data, and the fused training data as input, were designed. Support vector machines (SVM) with different kernel functions, CNN as well as ResNet were used as classification models. Group five-fold (Group5Fold) cross-validation was utilized to assess model performance. ResultsWe applied the augmentation methods and experimental ideas mentioned above to two real datasets – the Raman spectral dataset of lung cancer and the mid-infrared spectral dataset of glioma, respectively. The results illustrate that the generative adversarial networks working through adversarial learning concepts can produce new data approximate to the original. This technique can be a complementary means for expanding the size of the vibrational spectroscopy data. Moreover, by introducing different augmentation strategies, the classification accuracies of most classifiers were higher than the original training set. In addition, a more extensive and heterogeneous dataset can be yielded using our proposed SDA-Fusion method. We have trained more robust models that provided better predictive performance for both spectral datasets on the foundation of these data. ConclusionThis research aims to address the lack of data volume of vibrational spectra from cancer at the data level. It can provide the solution ideas to be consulted by other researchers in the future when faced with the small-sample learning tasks for vibrational spectra.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Application of spectral small-sample data combined with a method of spectral data augmentation fusion (SDA-Fusion) in cancer diagnosis

Abstract

Talk to us

Similar Papers

More From: Chemometrics and Intelligent Laboratory Systems

Lead the way for us

Journal: Chemometrics and Intelligent Laboratory Systems	Publication Date: Oct 3, 2022
Citations: 5

Similar Papers

Augmentation of Small Training Data Using GANs for Enhancing the Performance of Image Classification

-

29 Dec 2020
29 Dec 2020

Augmentation of Small Training Data Using GANs for Enhancing the Performance of Image Classification
Shih-Kai Hung ... John Q Gan
-
Shih-Kai Hung, et. al.Shih-Kai Hung ... John Q Gan
10 Jan 2021
10 Jan 2021

A review of synthetic and augmented training data for machine learning in ultrasonic non-destructive evaluation
Sebastian Uhlig ... Matthias Wolff
Ultrasonics | VOL. 134
Sebastian Uhlig, et. al.Sebastian Uhlig ... Matthias Wolff
18 May 2023
Ultrasonics | VOL. 134

One-shot segmentation of novel white matter tracts via extensive data augmentation and adaptive knowledge transfer.
Wan Liu ... Yaou Liu
Medical Image Analysis | VOL. 90
Wan Liu, et. al.Wan Liu ... Yaou Liu
01 Dec 2023
Medical Image Analysis | VOL. 90

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Application of spectral small-sample data combined with a method of spectral data augmentation fusion (SDA-Fusion) in cancer diagnosis

Abstract

Talk to us

Similar Papers

More From: Chemometrics and Intelligent Laboratory Systems