Statistical model building: Background \u201cknowledge\u201d based on inappropriate preselection causes misspecification

Lorena Hafermann,Heiko Becher,Georg Heinze,Nadja Klein,Geraldine Rauch,Carolin Herrmann

doi:10.1186/s12874-021-01373-z

Abstract

BackgroundStatistical model building requires selection of variables for a model depending on the model’s aim. In descriptive and explanatory models, a common recommendation often met in the literature is to include all variables in the model which are assumed or known to be associated with the outcome independent of their identification with data driven selection procedures. An open question is, how reliable this assumed “background knowledge” truly is. In fact, “known” predictors might be findings from preceding studies which may also have employed inappropriate model building strategies.MethodsWe conducted a simulation study assessing the influence of treating variables as “known predictors” in model building when in fact this knowledge resulting from preceding studies might be insufficient. Within randomly generated preceding study data sets, model building with variable selection was conducted. A variable was subsequently considered as a “known” predictor if a predefined number of preceding studies identified it as relevant.ResultsEven if several preceding studies identified a variable as a “true” predictor, this classification is often false positive. Moreover, variables not identified might still be truly predictive. This especially holds true if the preceding studies employed inappropriate selection methods such as univariable selection.ConclusionsThe source of “background knowledge” should be evaluated with care. Knowledge generated on preceding studies can cause misspecification.

Highlights

Statistical model building requires selection of variables for a model depending on the model’s aim
We evaluated the average relative frequency for each rule resulting in a correct identification of the true predictors referred to as “true positive rate” (TPR)
It can be seen that independently of the scenario, the true predictor set was hardly ever selected with model selection frequencies (MSF) always lower than 0.005

Summary

Introduction

Statistical model building requires selection of variables for a model depending on the model’s aim. In descriptive and explanatory models, a common recommendation often met in the literature is to include all variables in the model which are assumed or known to be associated with the outcome independent of their identification with data driven selection procedures. Statistical regression models play an important role in epidemiological and medical research. Variable selection is an essential aspect of model building in epidemiological and medical studies. (2021) 21:196 the number of candidate predictors seems too large for a meaningful interpretation or for a reliable prediction, the question is how to separate the truly predictive variables from the non-predictive ones and how assumed background knowledge influences this procedure. To get an overview of the relative performance of those methods is a challenging task [9]

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Medical Research Methodology	Publication Date: Sep 29, 2021
Citations: 9	License type: open-access

R Discovery Prime

R Discovery Prime

Statistical model building: Background \u201cknowledge\u201d based on inappropriate preselection causes misspecification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Medical Research Methodology

Lead the way for us

Similar Papers

Two tales of variable selection for high dimensional regression: Screening and model building
Cong Liu ... Yoonkyung Lee
Statistical Analysis and Data Mining: The ASA Data Science Journal | VOL. 7
Cong Liu, et. al.Cong Liu ... Yoonkyung Lee
01 Apr 2014
Statistical Analysis and Data Mining: The ASA Data Science Journal | VOL. 7

Causal Model Building in the Context of Cardiac Rehabilitation: A Systematic Review.
Nilufar Akbari ... Ben Sander
International journal of environmental research and public health | VOL. 20
Nilufar Akbari, et. al.Nilufar Akbari ... Ben Sander
11 Feb 2023
International journal of environmental research and public health | VOL. 20

Automated Machine Learning Tool: The First Stop for Data Science and Statistical Model Building
Deeparani Gopagoni ... P V
International Journal of Advanced Computer Science and Applications | VOL. 11
Deeparani Gopagoni, et. al.Deeparani Gopagoni ... P V
01 Jan 2020
International Journal of Advanced Computer Science and Applications | VOL. 11

A diagnostic modelling framework to construct indices of biotic integrity: A case study of fish in the Zeeschelde estuary (Belgium)
Paul Quataert ... Frans Ollevier
Estuarine, Coastal and Shelf Science | VOL. 94
Paul Quataert, et. al.Paul Quataert ... Frans Ollevier
24 Jun 2011
Estuarine, Coastal and Shelf Science | VOL. 94

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Statistical model building: Background \u201cknowledge\u201d based on inappropriate preselection causes misspecification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Medical Research Methodology