Classification and prediction with very imbalanced group sample sizes: an illustration with COVID-19 testing

Xin Qiao,Yishan Ding,George Macready,Chengbin Ying,Hong Jiao

doi:10.15406/bbij.2021.10.00342

Abstract

This study explored predictions of COVID test results using statistical classification methods based on available COVID-related data such as demographic and symptom information. The perfor Table mances of logistic regression, machine learning models, and latent class analysis in the predictions of extreme imbalanced COVID data were compared. One technical challenge of using statistical classification methods was tackled in the extreme imbalance sample sizes of the COVID data. The oversampling method was applied on the training dataset to mitigate the impact of such data structure on the training process. Further, the adjusted pooled sampling method based on the statistical classification results was proposed to facilitate the efficiency of COVID testing. Results indicate that some machine learning models (e.g., support vector machine) had better performance than traditional logistic regression model and latent class analysis under extreme imbalance data condition. Further, the oversampling method increased the sensitivity of various statistical classification methods when different cut-off values were applied. The adjusted pooled sampling was shown to be more efficient than the traditional pooled sampling method.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Classification and prediction with very imbalanced group sample sizes: an illustration with COVID-19 testing

Abstract

Talk to us

Similar Papers

More From: Biometrics & Biostatistics International Journal

Lead the way for us

Similar Papers

Machine learning-based prediction of abdominal aortic aneurysms for individualized patient care
Kelli L. Summers ... Malachi G. Sheahan
Journal of Vascular Surgery | VOL. 79
Kelli L. Summers, et. al.Kelli L. Summers ... Malachi G. Sheahan
05 Jan 2024
Journal of Vascular Surgery | VOL. 79

Machine learning vs. traditional regression analysis for fluid overload prediction in the ICU
Andrea Sikora ... John W Devlin
Scientific Reports | VOL. 13
Andrea Sikora, et. al.Andrea Sikora ... John W Devlin
10 Nov 2023
Scientific Reports | VOL. 13

Development and Validation of an Explainable Machine Learning Model for Major Complications After Cytoreductive Surgery
... Mustafa Raoof
JAMA Network Open | VOL. 5
, et. al. ... Mustafa Raoof
25 May 2022
JAMA Network Open | VOL. 5

PND117 IDENTIFYING PREDICTORS OF HIGH-COST MULTIPLE SCLEROSIS PATIENTS: A MACHINE LEARNING APPROACH
S.M Burns ... J Menzin
Value in Health | VOL. 23
S.M Burns, et. al.S.M Burns ... J Menzin
01 May 2020
Value in Health | VOL. 23

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Classification and prediction with very imbalanced group sample sizes: an illustration with COVID-19 testing

Abstract

Talk to us

Similar Papers

More From: Biometrics &amp; Biostatistics International Journal

More From: Biometrics & Biostatistics International Journal