A Proposed Design and Analysis for Comparing Digital and Analog Mammography

Stuart G Baker,Paul F Pinsky

doi:10.1198/016214501753168136

Abstract

Because randomized trials have shown a reduction in breast cancer mortality, analog mammography for the early detection of breast cancer has gained widespread use. Recently, several manufacturers have developed digital mammography, which promises great advantages in the storage and transmission of images. We were asked to design a study to compare the two types of mammography in terms of their performance for the early detection of breast cancer. A standard measure of mammography performance is the receiver operating characteristic (ROC) curve, which is a plot of false- and true-positive rates for each ordered classification of the mammography images. Methods for study design and data analysis based on ROC curves have been well developed for diagnostic tests, particularly in radiology. But for comparing the performance of mammography for the early detection of breast cancer among asymptomatic women, special considerations motivate new designs and methodology. First, digital mammography may cost substantially more than analog mammography. If this is the case, then the standard paired design, in which each subject undergoes both types of mammography, may be more expensive than necessary. To reduce costs, we propose a partial testing design, in which all subjects undergo analog mammography and those recommended for biopsy and a random sample not recommended for biopsy also undergo digital mammography. Second, the false-positive rate for analog mammography, defined as the rate of unnecessary biopsy, is near 1%. A standard ROC analysis that compares areas under the entire ROC curve would summarize performance over false-positive rates that are not relevant for evaluating the performance of cancer screening. As a more appropriate alternative, we propose basing inference on the areas under the small part of the ROC curves near the false-positive rates corresponding to a biopsy recommendation. Third, the vast majority of screened subjects are not biopsied, and so have an unknown cancer state at the time of screening. To make inference about the performance of a cancer screening test, the standard approach is to follow subjects not biopsied for some period, usually 1 year, and assume that those who developed cancer were missed on screening and those who did not develop cancer were cancer-free at screening. Unfortunately, this follow-up period can greatly lengthen the duration of the study. To compare the performance of digital and analog mammography without the need for a follow-up period, we propose estimating the ratio of areas under the ROC curves near the small false-positive rates associated with a biopsy recommendation. To compute sample sizes, our null hypothesis is that the ratio of partial ROC areas is 1, and our two possible alternative hypotheses are ratios of 1.6 and 2, both indicating superior performance for digital mammography. We assume a breast cancer prevalence of .003 and specify various parameters for the shapes of the ROC curves and their dependence. For a two-sided type I error of .05 and a power of .9, a standard paired design would require that 22,000 subjects undergo both analog and digital mammography. For the same type I error and power, the proposed partial testing design would require that 35,000 subjects undergo analog mammography and 10,000 subjects undergo both analog and digital mammography. Compared to the paired design, the reduction in the cost per subject is 23% if digital mammography costs four times as much as analog mammography and 41% if digital mammography costs 10 times as much as analog mammography.

Full Text