Abstract

Due to limitations in disease prevalence and hospital specificity, spectral data are often collected with unbalanced sample size. To solve this problem, a new sampling method – grouped-sampling was proposed in this research, which is shown to be effective for unbalanced data. It avoids over-fitting of over-sampling and overcomes under-sampling utilization of under-sampling. In this study, we applied grouped-sampling to two unbalanced datasets where the sample proportions are 199:40 and 75:225. And then verified from two classic models: PCA-SVM (Principal Component Analysis-Support Vector Machine) and the deep learning algorithm GoogLeNet. The accuracy of these two datasets were 85.11% and 96.15% in PCA-SVM and 85.10% and 84.61% on GoogLeNet. Also, the F1-score were evaluated to measure the classification balance of sampling method, and result shows that F1-score of grouped-sampling is always the highest compared to over-sampling and under-sampling. In summary, compared to traditional sampling methods, grouped-sampling performs better on prediction for classes with smaller sample size, which means grouped-sampling can improve the balance of classification results and the potential of practical application. Therefore, we develop a group sampling method that distinguishes between under- and over-sampling, which greatly improves the accuracy and balance of predictions for unbalanced samples.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call