Abstract
BackgroundLogistic regression is a useful statistical technique commonly used in many fields like healthcare, marketing, or finance to generate insights from binary outcomes (e.g., sick vs. not sick). However, when applying logistic regression to complex survey data, which includes complex sampling designs, specific methodological issues are often overlooked.MethodsThe systematic review extensively searched the PubMed and ScienceDirect databases from January 2015 to December 2021, following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 guidelines, focusing primarily on the Demographic and Health Surveys (DHS) and Multiple Indicator Cluster Surveys (MICS). 810 articles met the inclusion criteria and were included in the analysis. When discussing logistic regression, the review considered multiple methodological problems such as the model adequacy assessment, handling dependence of observations, utilization of complex survey design, dealing with missing values, outliers, and more.ResultsAmong the selected articles, the DHS database was used the most (96%), with MICS accounting for only 3%, and both DHS and MICS accounting for 1%. Of these, it was found that only 19.7% of the studies employed multilevel mixed-effects logistic regression to account for data dependencies. Model validation techniques were not reported in 94.8% of the studies with limited uses of the bootstrap, jackknife, and other resampling methods. Moreover, sample weights, PSUs, and strata variables were used together in 40.4% of the articles, and 41.7% of the studies did not use any of these variables, which could have produced biased results. Goodness-of-fit assessments were not mentioned in 75.3% of the articles, and the Hosmer–Lemeshow and likelihood ratio test were the most common among those reported. Furthermore, 95.8% of studies did not mention outliers, and only 41.0% of studies corrected for missing information, while only 2.7% applied imputation techniques.ConclusionsThis systematic review highlights important gaps in the use of logistic regression with complex survey data, such as overlooking data dependencies, survey design, and proper validation techniques, along with neglecting outliers, missing data, and goodness-of-fit assessments, all of which point to the need for clearer methodological standards and more thorough reporting to improve the reliability of results. Future research should focus on consistently following these standards to ensure stronger and more dependable findings.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have