Against quantiles: categorization of continuous variables in epidemiologic research, and its discontents

Caroline Bennette,Andrew Vickers

doi:10.1186/1471-2288-12-21

Abstract

BackgroundQuantiles are a staple of epidemiologic research: in contemporary epidemiologic practice, continuous variables are typically categorized into tertiles, quartiles and quintiles as a means to illustrate the relationship between a continuous exposure and a binary outcome.DiscussionIn this paper we argue that this approach is highly problematic and present several potential alternatives. We also discuss the perceived drawbacks of these newer statistical methods and the possible reasons for their slow adoption by epidemiologists.SummaryThe use of quantiles is often inadequate for epidemiologic research with continuous variables.

Highlights

Quantiles are a staple of epidemiologic research: in contemporary epidemiologic practice, continuous variables are typically categorized into tertiles, quartiles and quintiles as a means to illustrate the relationship between a continuous exposure and a binary outcome
Epidemiology is often introduced using examples in which both exposure and outcome are considered in binary terms: research participants are defined as having, say, lung cancer or not, and being smokers or not, and the proportion of smokers compared between cases and controls
Analysis Categorization of continuously distributed exposure variables is associated with three problems: first, it involves multiple hypothesis testing with pairwise comparisons of quantiles; second, it requires an unrealistic step-function of risk that assumes homogeneity of risk within groups, leading to both a loss of power and inaccurate estimation; and third, it leads to difficulty comparing results across studies due to the data-driven cut points used to define categories

Summary

Discussion

Analysis Categorization of continuously distributed exposure variables is associated with three problems: first, it involves multiple hypothesis testing with pairwise comparisons of quantiles; second, it requires an unrealistic step-function of risk that assumes homogeneity of risk within groups, leading to both a loss of power and inaccurate estimation; and third, it leads to difficulty comparing results across studies due to the data-driven cut points used to define categories. Meaningful comparisons derived from the non-linear model can concisely describe the association: we reported the difference in absolute risk of recurrence for a typical patient treated by a surgeon who had performed 10 procedures and for a surgeon who had performed 250 prior procedures These values were chosen after consultation with surgeons and were intended to reflect meaningful levels of experience; the estimates are obtained from the model including non-linear terms, not from a categorization approach. Analyses of continuous variables can be presented in readily meaningful terms; we would argue that clinically relevant comparisons are often more understood and useful than the estimates derived from data-driven quantiles Another argument against regression techniques involving non-linear terms is that the resulting models are prone to overfit [14]. Competing interests The author declare that they have no competing interests

Background

Findings

11. Greenland S

14. Weinberg CR

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Medical Research Methodology	Publication Date: Feb 29, 2012
Citations: 346	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

Against quantiles: categorization of continuous variables in epidemiologic research, and its discontents

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Medical Research Methodology

Lead the way for us

Similar Papers

Categorisation of continuous variables in a logistic regression model using the R package CatPredi
Irantzu Barrio ... María Xosé Rodríguez-Álvarez
-
Irantzu Barrio, et. al.Irantzu Barrio ... María Xosé Rodríguez-Álvarez
04 Dec 2015
04 Dec 2015

Meta-analysis for individual participant data with a continuous exposure: A case study
Darsy Darssan ... Annette J Dobson
Journal of Clinical Epidemiology | VOL. 140
Darsy Darssan, et. al.Darsy Darssan ... Annette J Dobson
04 Sep 2021
Journal of Clinical Epidemiology | VOL. 140

Reporting and Methodology of Multivariable Analyses in Prognostic Observational Studies Published in 4 Anesthesiology Journals
Jean Guglielminotti ... Cedric Laouénan
Anesthesia & Analgesia | VOL. 121
Jean Guglielminotti, et. al.Jean Guglielminotti ... Cedric Laouénan
01 Oct 2015
Anesthesia & Analgesia | VOL. 121

The cost of dichotomising continuous variables
Douglas G Altman ... Patrick Royston
BMJ | VOL. 332
Douglas G Altman, et. al.Douglas G Altman ... Patrick Royston
04 May 2006
BMJ | VOL. 332

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Against quantiles: categorization of continuous variables in epidemiologic research, and its discontents

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Medical Research Methodology