Abstract
The primary goal of this research is to examine the impact of balancing data on the prediction quality and inference in multilevel logistic regression models. Logistic regression is a valuable approach for modeling binary outcomes expected in health applications. The class imbalance problem, where one of the two outcome categories occurs much more often than the other, is common in healthcare data, such as when modeling the risk factors for rare diseases. The issue is particularly relevant for medical data that contains individual measurements and other data sources measured at a geographic region level, such as environmental risk factors. For this work, both prediction and model interpretation are of interest. A simulation model is proposed to test the impact of balancing strategies on the logistic multilevel model's parameter estimation, inference, and predictive performance. The simulated information emulates characteristics of a Gestational Diabetes Mellitus (GDM) dataset from Indiana's Medicaid program. Several datasets were simulated with varying levels of complexity, involving the balance of the outcome variable and predictors. These datasets exhibited high- or low-frequency occurrences in specific intersections of variables, often called ‘cells.’ The impact of the balancing strategies on prediction and inference was assessed using different techniques, such as the Equivalence (TOST) Test, power analysis, and predictive measures. To the best of our knowledge, this is the first research that explores the impact of using balanced samples on coefficient estimation and prediction measures when using logistic multilevel modeling, finding evidence about the benefits of using balanced samples in this context.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.