CORONARY HEART DISEASE (CHD) IS COMMON AND includes the diagnoses angina pectoris, myocardial infarction, and coronary disease death. Predicting the first occurrence of these events is important and may affect clinical decisions and care. It can be expected that more than a third of adult Americans will develop CHD during their lifetime, and improvements in diagnosis and care may provide great benefits. The article by Tzoulaki and colleagues assesses the scientific literature concerning efforts to improve the prediction of CHD over and above the Framingham risk score. The authors report that many articles have provided information on new risk factors, but the study designs and data analyses in those reports raise concerns about the usefulness of the new information and about how improvement over the Framingham risk score was assessed. Researchers and clinicians should understand how CHD risk estimation is undertaken and evaluated, as these methods are frequently used to assess risk of developing disease across a variety of health disciplines. Risk estimates for initial CHD events should be derived from studies based on complete information for carefully measured risk factors at baseline, adequate follow-up, and reliable outcome data. A common initial step is to use proportional hazards univariate or age-adjusted regression models. When possible, the variables of interest are analyzed as continuous measures. Factors that are significant in the univariate analyses are then considered for inclusion in multivariable prediction models. With this approach, a set of variables was developed by Framingham investigators and included age, sex, systolic blood pressure, cholesterol level, high-density lipoprotein cholesterol level, diabetes mellitus, and current smoking. This approach was further evaluated with external validation sets from cohort studies across the United States. Including newer variables is of considerable interest because factors might improve the prediction of CHD and prevention strategies might be more effective. A variety of performance criteria are used to evaluate the usefulness of CHD risk prediction and a brief summary explains the terms used to interpret these studies, including relative risk, discrimination, calibration, and reclassification. For each risk factor, proportional hazards modeling yields regression coefficients for a study cohort. The relative risk of a variable is computed by exponentiating the regression coefficient in the multivariable regression models. This measure estimates the difference in risk for an individual with a given risk factor such as cigarette smoking compared with the risk for an individual who does not smoke. An analogous approach can be undertaken to estimate effects for continuous variables by showing effects for a specific number of units for the variable or by identifying differences in risk associated with a difference in the number of units associated with a standard deviation for the factor. Discrimination is the ability of a statistical model to separate those who experience clinical CHD events from those who do not. The C statistic, analogous to the area under a receiver operating characteristic curve, is the typical performance measure used. This statistic represents an estimate of the probability that a model assigns a higher risk to those who develop CHD within a specified follow-up than to those who do not and represents a composite of the overall sensitivity and specificity of the prediction equation. Values for the C statistic range from 0.00 to 1.00; 0.50 reflects discrimination by chance. Higher values generally indicate a good level of agreement between observed and predicted risks. The average C statistic for the prediction of CHD is typically in the 0.70 range. The error associated with Cstatistic estimates can be estimated and used to compare differences in risk prediction models. Calibration measures how closely predicted estimates of absolute risk agree with actual outcomes. To present calibration analyses, the data are often separated into deciles of risk, and observed rates are tested for differences from what was expected from the estimating equation. A version of the Hosmer-Lemeshow 2 statistic can be used to evaluate how well the observed and expected agree; smaller 2 values generally indicate good calibration. A CHD prediction model might be recalibrated if it provides relatively good ranking of risk and discrimination, but the model sys-