Abstract
Machine learning is increasingly being applied to medical research, particularly in selecting predictive modelling variables. By identifying relevant variables, researchers can improve model accuracy and reliability, leading to better clinical decisions and reduced overfitting. Efficient utilization of resources and the validity of medical research findings depend on selecting the right variables. However, few studies compare the performance of classical and modern methods for selecting characteristics in health datasets, highlighting the need for a critical evaluation to choose the most suitable approach. We analysed the performance of six different variable selection methods, which includes stepwise, forward and backward selection using p-value and AIC, LASSO, and Elastic Net. Health-related surveillance data on behaviors, health status, and medical service usage were used across ten databases, with sizes ranging from 10% to 100%, maintaining consistent outcome proportions. Varying database sizes were utilized to assess their impact on prediction models, as they can significantly influence accuracy, overfitting, generalizability, statistical power, parameter estimation reliability, computational complexity, and variable selection. The stepwise and backward AIC model showed the highest accuracy with an Area under the ROC Curve (AUC) of 0.889. Despite its sparsity, the Lasso and Elastic Net model also performed well. The study also found that binary variables were considered more crucial by the Lasso and Elastic Net model. Importantly, the significance of variables remained consistent across different database sizes. The study shows that no major variations in results between the fitness metric of the model and the number of variables in stepwise and backward p-value models, irrespective of the database's size. LASSO and Elastic Net models surpassed other models throughout various database sizes, and with fewer variables.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: International Journal of Statistics in Medical Research
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.