Abstract

BackgroundStatistical model building requires selection of variables for a model depending on the model’s aim. In descriptive and explanatory models, a common recommendation often met in the literature is to include all variables in the model which are assumed or known to be associated with the outcome independent of their identification with data driven selection procedures. An open question is, how reliable this assumed “background knowledge” truly is. In fact, “known” predictors might be findings from preceding studies which may also have employed inappropriate model building strategies.MethodsWe conducted a simulation study assessing the influence of treating variables as “known predictors” in model building when in fact this knowledge resulting from preceding studies might be insufficient. Within randomly generated preceding study data sets, model building with variable selection was conducted. A variable was subsequently considered as a “known” predictor if a predefined number of preceding studies identified it as relevant.ResultsEven if several preceding studies identified a variable as a “true” predictor, this classification is often false positive. Moreover, variables not identified might still be truly predictive. This especially holds true if the preceding studies employed inappropriate selection methods such as univariable selection.ConclusionsThe source of “background knowledge” should be evaluated with care. Knowledge generated on preceding studies can cause misspecification.

Highlights

  • Statistical model building requires selection of variables for a model depending on the model’s aim

  • We evaluated the average relative frequency for each rule resulting in a correct identification of the true predictors referred to as “true positive rate” (TPR)

  • It can be seen that independently of the scenario, the true predictor set was hardly ever selected with model selection frequencies (MSF) always lower than 0.005

Read more

Summary

Introduction

Statistical model building requires selection of variables for a model depending on the model’s aim. In descriptive and explanatory models, a common recommendation often met in the literature is to include all variables in the model which are assumed or known to be associated with the outcome independent of their identification with data driven selection procedures. Statistical regression models play an important role in epidemiological and medical research. Variable selection is an essential aspect of model building in epidemiological and medical studies. (2021) 21:196 the number of candidate predictors seems too large for a meaningful interpretation or for a reliable prediction, the question is how to separate the truly predictive variables from the non-predictive ones and how assumed background knowledge influences this procedure. To get an overview of the relative performance of those methods is a challenging task [9]

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.