Categorization of continuous covariates and complex regression models—friends or foes in intersectionality research

Adrian Richter,Sabina Ulbricht,Sarah Brockhaus

doi:10.1016/j.jclinepi.2024.111368

Abstract

ObjectivesTo reduce health inequities, it is important to identify intersections in characteristics of individuals subject to privilege or disadvantage. Different proposals for that have recently been published. One approach (1) considers models specified with first- and all second-order effects and another (2) the stratification based on multiple covariates; both categorize continuous covariates. A simulation study was conducted in order to review both methods with regard to identification of intersections showing true differences, rate of false-positive results, and generalizability to independent data compared to an established approach (3) of backward variable elimination according to Bayesian information criterion (BE-BIC) combined with splines. Study Design and SettingR software has been used to simulate the covariates age, sex, body mass index, education, and diabetes to examine their association with a continuous frailty score for osteoporosis using multiple linear regression. In setting 1, none of the covariates was associated with the frailty score, that is, only noise is present in the data. In setting 2, the covariates age, sex, and their interaction were associated with the frailty score, such that only females above 55 years formed an intersection associated with an increased frailty score. All approaches were compared under varying sample sizes (N = 200–3000) and signal-to-noise ratios (SNRs, 0.5–4) in 1000 replications. For model evaluation, bootstrap resampling was used. The models were fitted in internal learning data and then used to predict outcomes in the internal validation data. The mean squared error (MSE) was used for comparison and the frequency of false-positive findings calculated. ResultsIn setting 1, approaches 1 and 2 generated spurious effects in more than 90% of simulations across all sample sizes. In a smaller sample size, approach 3 (BE-BIC) selected 36.5% of the correct model, in larger sample size in 89.8% and always had a lower number of spurious effects. MSE in independent data was generally higher for approaches 1 and 2 when compared to 3. In setting 2, approach 1 selected most frequently the correct interaction but frequently showed spurious effects (>75%). Across all sample sizes and SNR, approach 3 generated least often spurious results and had lowest MSE in independent data. ConclusionCategorization of continuous covariates is detrimental to studies on intersectionality. Due to high and unrestricted model complexity, such approaches are prone to spurious effects and often lack interpretability. Approach 3 (BE-BIC) is considerably more robust against spurious findings, showed better generalizability to independent data, and can be used with most statistical software. For intersectionality research, we consider it most important to describe relevant differences between intersections and to avoid nonreproducible and spurious findings.

Full Text