Abstract
The existence of missing values in the data, which must be taken into account, is an important issue in research. In particular, in the case of medical data, missing data can lead to a loss of trust in patient evaluation and make it impossible to classify people according to their level of health or disease. Furthermore, multicollinearity between independent variables can lead to misleading results. Therefore, the present research aims to study the efficiency of missing data imputation techniques for logistic regression with complete multicollinearity. The missing data imputation methods considered in this research were MEAN (mean imputation), MI (multiple imputation), KNN (k-nearest neighbor imputation), RF (random forest imputation), SRI (stochastic regression imputation), and BRI (Bayesian linear regression imputation). In this study, the simulation conducted was done with sample sizes of 20, 50, 100, 150, 200, 500, and 1000, as well as percentages of missing data at 10, 20, 30, and 40%. To compare efficiency, the EMSE (estimated mean square error) was used. The results showed that the RF method was most effective when the sample size was large and with a high percentage of missing data, whereas the MEAN method had the worst performance in all cases. When the sample size is small with a high proportion of missing data, KNN might be a preferable alternative for imputation. The inquiry into expanding the number of independent variables and different patterns of multicollinearity may be essential for future work.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.