Abstract

Due to limitations in disease prevalence and hospital specificity, spectral data are often collected with unbalanced sample size. To solve this problem, a new sampling method – grouped-sampling was proposed in this research, which is shown to be effective for unbalanced data. It avoids over-fitting of over-sampling and overcomes under-sampling utilization of under-sampling. In this study, we applied grouped-sampling to two unbalanced datasets where the sample proportions are 199:40 and 75:225. And then verified from two classic models: PCA-SVM (Principal Component Analysis-Support Vector Machine) and the deep learning algorithm GoogLeNet. The accuracy of these two datasets were 85.11% and 96.15% in PCA-SVM and 85.10% and 84.61% on GoogLeNet. Also, the F1-score were evaluated to measure the classification balance of sampling method, and result shows that F1-score of grouped-sampling is always the highest compared to over-sampling and under-sampling. In summary, compared to traditional sampling methods, grouped-sampling performs better on prediction for classes with smaller sample size, which means grouped-sampling can improve the balance of classification results and the potential of practical application. Therefore, we develop a group sampling method that distinguishes between under- and over-sampling, which greatly improves the accuracy and balance of predictions for unbalanced samples.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.