Abstract

found that models with a categorized exposure variable removed only 67% of the confounding controlled when the continuous version was used. Categorizing continuous variables may not only miss the message, it can also get it wrong. Under some circumstances, categorizing continuous variables can give biased results. In a simulation study, Taylor and Yu [6] found that categorizing one continuous variable can artificially make another variable appear associated with the outcome. Selvin [4] showed that the cutpoint chosen during the categorization of continuous variables significantly changed the calculated odds ratio. Royston et al. [2] found that the significant association of the ‘S phase fraction’ with cancer outcomes repeatedly came and went depending on which cutpoint was used to define ‘abnormal’. Ragland [7] showed similar findings with prevalence ratios of hypertension. Information loss and bias from categorizing continuous variables explain why statisticians frequently warn us to leave continuous variables alone [2, 8] . It appears that this advice was lost to investigators – ourselves included – who have developed risk stratification schemes for patients with atrial fibrillation (AF). Quantifying stroke risk in AF is essential for patient management: high-risk patients require oral anticoagulants while low-risk patients (who stand to have a minimal absolute benefit from treatment) can avoid such a therapy. Patient age is significantly associated with stroke Continuous variables – be they outcomes, exposures or covariates – are common in clinical studies. They are frequently modified into categorical variables during their analysis. Pocock et. al. [1] found that 84% of epidemiological articles from leading journals categorized continuous variables. Such a categorization could be done for several reasons [2] . It is commonly perceived that categorization makes it easier to report and interpret final results (‘X doubles the risk of Y’ vs. ‘The risk of Y doubles when X increases by 10 units’). Researchers may be uncomfortable assuming a linear relationship between a continuous variable and the outcome but are unfamiliar with methods of handling non-linearity. Researchers and analysts may have less experience in dealing with continuous variables and prefer to make them behave like the more familiar categorical ones. Finally, it is also possible that physicians and epidemiologists, who frequently categorize continuous measures during their routine life (hypertensive or not, dyslipidemic or not, etc.), instinctually transplant this training from the clinic or field to their analysis. However, categorizing continuous variables can cause problems. The first is information loss. Zhao and Kolonel [3] found that analyses with categorized continuous variables required greater than 40% more patients for the same power as that achieved using continuous variables. Selvin [4] derives a formula to calculate the efficiency loss due to categorizing a continuous variable. Becher et al. [5] Received: February 9, 2008 Accepted: February 9, 2008 Published online: April 17, 2008

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call