Abstract

BackgroundFor epidemiological research, cancer registry datasets often need to be augmented with additional data. Data linkage is not feasible when there are no cases in common between data sets. We present a novel approach to augmenting cancer registry data by imputing pre-diagnosis health behaviour and estimating its relationship with post-diagnosis survival time.MethodsSix measures of pre-diagnosis health behaviours (focussing on tobacco smoking, ‘at risk’ alcohol consumption, overweight and exercise) were imputed for 28,000 cancer registry data records of US oesophageal cancers using cold deck imputation from an unrelated health behaviour dataset. Each data point was imputed twice. This calibration allowed us to estimate the misclassification rate. We applied statistical correction for the misclassification to estimate the relative risk of dying within 1 year of diagnosis for each of the imputed behaviour variables. Subgroup analyses were conducted for adenocarcinoma and squamous cell carcinoma separately.ResultsSimulated survival data confirmed that accurate estimates of true relative risks could be retrieved for health behaviours with greater than 5% prevalence, although confidence intervals were wide. Applied to real datasets, the estimated relative risks were largely consistent with current knowledge. For example, tobacco smoking status 5 years prior to diagnosis was associated with an increased age-adjusted risk of all cause death within 1 year of diagnosis for oesophageal squamous cell carcinoma (RR = 1.99 95% CI 1.24,3.12) but not oesophageal adenocarcinoma RR = 1.61, 95% CI 0.79,2.57).ConclusionsWe have demonstrated a novel imputation-based algorithm for augmenting cancer registry data for epidemiological research which can be used when there are no cases in common between data sets. The algorithm allows investigation of research questions which could not be addressed through direct data linkage.

Highlights

  • For epidemiological research, cancer registry datasets often need to be augmented with additional data

  • The phi coefficients, φ, show that there is usually a positive correlation between the two imputed values, albeit weak. This confirms that some information about health behaviour is being conveyed through the random cold deck imputation

  • The value npi(1 − pi)ρ, the number of correct matches greater than would be expected through chance, quantifies the information conveyed through the imputation

Read more

Summary

Introduction

Cancer registry datasets often need to be augmented with additional data. We present a novel approach to augmenting cancer registry data by imputing pre-diagnosis health behaviour and estimating its relationship with post-diagnosis survival time. In 2011 it was estimated that that the cost of maintaining the United States’ National Program of Cancer Registries was $US60.77 per case [1]. The estimated number of new United States cancer cases in 1999 was 1,291,451 [2] and 1,762,450 in 2019 [3] an increase of 36% in 20 years. As in any public investment, there is always a need to maintain, and increase, benefits of cancer registries relative to costs. Since the 1990s, for example, the development of specialised data linkage infrastructure has open wide new research applications [4]. There are still research questions which are waiting for a suitable method of analysis

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call