Abstract

Background: Cancer registries worldwide are vital to determine cancer burden, plan cancer control measures, and facilitate research. Population-based cancer registries are a priority for LMICs by the UICC; the National Cancer Registry Program (NCRP) of India oversees 28 such registries. A primary function of registries is to combine data for the same individual from multiple sources. For other disease cohorts where cancer is an outcome of interest, registries can potentially connect information by linking datasets together. Barriers to successful registration and linkages include systems in which cancer is not a notifiable disease, no universal unique individual identifier exists, and lack of trained personnel. This study utilizes technology and infrastructure to develop better linkages, surveillance, and outcomes. Aim: To assess the feasibility of linking large cohorts designed for cardio-metabolic disease research with cancer registries in New Delhi and Chennai; determine additional steps required for linkage accuracy and completeness; and develop detailed protocols for future applications. Methods: A pilot protocol for linkage between a large diabetes cohort and cancer registries in Delhi and Chennai was developed using MatchPro, a probabilistic record linkage program developed for cancer registries. Probabilistic software links datasets together in the presence of uncertainty (eg misspelled or abbreviated names) to identify record pairs with high probability of representing the same individual. For this study, algorithms were developed to address unique aspects of names and demographics in India. The software and algorithms focused on: detecting duplicates in cancer registries; and linking registries with external files from diabetes cohorts. In Delhi, 3 1-year datasets covering 3 years (2010, 2011, 2012) were linked with the diabetes cohort; in Chennai, the linkage included 3 5-year datasets covering 15 years (2000-04, '05-'09, '10-'14). The unique ID (Aadhaar) is not collected or linked systematically between different systems at this point in time. Results: Linkage attempts yielded potential matches ranked according to probabilistic scores; highest scores were reviewed to determine true matches. In Chennai, this process yielded: (2010-2014) 21% self-reported (SR) cases matching perfectly, 36% requiring follow-up, 13 nonreported (NR) cases found; 2005-2009: 33% SR cases matched perfectly, 1 NR case found; 2000-2004: 1 NR case. Also, 2 training workshops on data linkages and software were held. Conclusion: Linkages between cancer registries and other data sources are feasible in LMICs using probabilistic record linkage software augmented by manual matching. Future efforts to use existing epidemiologic resources (cohorts) and cancer research infrastructure (registries and clinical centers) can enhance research including understanding shared risk factors and pathophysiologic mechanisms e.g., between cancer and other NCD.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call