Performance of the Classical Model in Feature Selection Across Varying Database Sizes of Healthcare Data

Kannan Thiruvengadam,Dadakhalandar Doddamani,Rajendran Krishnan

doi:10.6000/1929-6029.2024.13.21

Kannan Thiruvengadam, Dadakhalandar Doddamani + Show 1 more

Open Access

https://doi.org/10.6000/1929-6029.2024.13.21

Copy DOI

Abstract

Machine learning is increasingly being applied to medical research, particularly in selecting predictive modelling variables. By identifying relevant variables, researchers can improve model accuracy and reliability, leading to better clinical decisions and reduced overfitting. Efficient utilization of resources and the validity of medical research findings depend on selecting the right variables. However, few studies compare the performance of classical and modern methods for selecting characteristics in health datasets, highlighting the need for a critical evaluation to choose the most suitable approach. We analysed the performance of six different variable selection methods, which includes stepwise, forward and backward selection using p-value and AIC, LASSO, and Elastic Net. Health-related surveillance data on behaviors, health status, and medical service usage were used across ten databases, with sizes ranging from 10% to 100%, maintaining consistent outcome proportions. Varying database sizes were utilized to assess their impact on prediction models, as they can significantly influence accuracy, overfitting, generalizability, statistical power, parameter estimation reliability, computational complexity, and variable selection. The stepwise and backward AIC model showed the highest accuracy with an Area under the ROC Curve (AUC) of 0.889. Despite its sparsity, the Lasso and Elastic Net model also performed well. The study also found that binary variables were considered more crucial by the Lasso and Elastic Net model. Importantly, the significance of variables remained consistent across different database sizes. The study shows that no major variations in results between the fitness metric of the model and the number of variables in stepwise and backward p-value models, irrespective of the database's size. LASSO and Elastic Net models surpassed other models throughout various database sizes, and with fewer variables.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Performance of the Classical Model in Feature Selection Across Varying Database Sizes of Healthcare Data

Abstract

Talk to us

Similar Papers

More From: International Journal of Statistics in Medical Research

Lead the way for us

Journal: International Journal of Statistics in Medical Research	Publication Date: Oct 14, 2024
License type: CC BY-NC 4.0

Similar Papers

Regression shrinkage and selection variables via an adaptive elastic net model
Ghadeer Jasim Mohammed Mahdi ... Nadia Jasim Mohammed
Journal of Physics: Conference Series | VOL. 1879
Ghadeer Jasim Mohammed Mahdi, et. al.Ghadeer Jasim Mohammed Mahdi ... Nadia Jasim Mohammed
01 May 2021
Journal of Physics: Conference Series | VOL. 1879

Scaling of True and Apparent ROC AUC with Number of Observations and Number of Variables
Paul F Pinsky
Communications in Statistics - Simulation and Computation | VOL. 34
Paul F PinskyPaul F Pinsky
01 Jan 2004
Communications in Statistics - Simulation and Computation | VOL. 34

Risk prediction of 30-day mortality after stroke using machine learning: a nationwide registry-based cohort study
Wenjuan Wang ... Charles D Wolfe
BMC Neurology | VOL. 22
Wenjuan Wang, et. al.Wenjuan Wang ... Charles D Wolfe
27 May 2022
BMC Neurology | VOL. 22

Comparison and analysis of the accuracy of Lasso regression, Ridge regression and Elastic Net regression models in predicting students' teaching quality achievement
Pinguang Ren
Applied and Computational Engineering | VOL. 51
Pinguang RenPinguang Ren
25 Mar 2024
Applied and Computational Engineering | VOL. 51

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Performance of the Classical Model in Feature Selection Across Varying Database Sizes of Healthcare Data

Abstract

Talk to us

Similar Papers

More From: International Journal of Statistics in Medical Research