Machine Learning Methods to Identify Missed Cases of Bladder Cancer in Population-Based Registries.

Anne-Michelle Noone,Mousumi Banerjee,Eric Boyd,Angela B Mariotto,Angela B Smith,Clara J K Lam,Matthew E Nielsen

doi:10.1200/cci.20.00170

Abstract

PURPOSEPopulation-based cancer incidence rates of bladder cancer may be underestimated. Accurate estimates are needed for understanding the burden of bladder cancer in the United States. We developed and evaluated the feasibility of a machine learning–based classifier to identify bladder cancer cases missed by cancer registries, and estimated the rate of bladder cancer cases potentially missed.METHODSData were from population-based cohort of 37,940 bladder cancer cases 65 years of age and older in the SEER cancer registries linked with Medicare claims (2007-2013). Cases with other urologic cancers, abdominal cancers, and unrelated cancers were included as control groups. A cohort of cancer-free controls was also selected using the Medicare 5% random sample. We used five supervised machine learning methods: classification and regression trees, random forest, logic regression, support vector machines, and logistic regression, for predicting bladder cancer.RESULTSRegistry linkages yielded 37,940 bladder cancer cases and 766,303 cancer-free controls. Using health insurance claims, classification and regression trees distinguished bladder cancer cases from noncancer controls with very high accuracy (95%). Bacille Calmette-Guerin, cystectomy, and mitomycin were the most important predictors for identifying bladder cancer. From 2007 to 2013, we estimated that up to 3,300 bladder cancer cases in the United States may have been missed by the SEER registries. This would result in an average of 3.5% increase in the reported incidence rate.CONCLUSIONSEER cancer registries may potentially miss bladder cancer cases during routine reporting. These missed cases can be identified leveraging Medicare claims and data analytics, leading to more accurate estimates of bladder cancer incidence.

Full Text