Variable selection methods were poorly reported but rarely misused in major medical journals: Literature review

T Pressat-Laffouilhère,R Jouffroy,A Leguillou,G Kerdelhue,J Benichou,A Gillibert

doi:10.1016/j.jclinepi.2021.07.006

Abstract

Objective This work presents a review of the literature on reporting, practice and misuse of knowledge-based and data-driven variable selection methods, in five highly cited medical journals, considering recoding and interaction unlike previous reviews.Study Design and Setting Original observational studies with a predictive or explicative research question with multivariable analyses published in N. Engl. J. Med., Lancet, JAMA, Br. Med. J. and Ann. Intern. Med. between 2017 and 2019 were searched. Article screening was performed by a single reader, data extraction was performed by two readers and a third reader participated in case of disagreement. The use of data-driven variable selection methods in causal explicative questions was considered as misuse.Results 488 articles were included. The variable selection method was unclear in 234 (48%) articles, data-driven in 78 (16%) articles and knowledge-based in 176 (36%) articles. The most common data-driven methods were: Univariate selection (n = 22, 4.5%) and model comparisons or testing for interaction (n = 17, 3.5%). Data-driven methods were misused in 51 (10.5%) of articles.Conclusion Overall reporting of variable selection methods is insufficient. Data-driven methods seem to be used only in a minority of articles of the big five medical journals.

Full Text