Abstract

Publications utilizing the National Cancer Database (NCDB) have risen over 1000-fold over the past decade. While some studies address identical questions, sample sizes and thusly conclusions are often different. Furthermore, there is little correlation between NCDB analysis and level I data. Many of these publications are not adequately addressing the methodology with which they create their sample from the initial dataset. Our goal in this analysis was to address the rate of discordant information in the NCDB, primarily with TNM staging, to underscore the needs to adequately display methods.We utilized the NCDB Non-Small Cell Cancer Database to demonstrate the rate of errors and referenced the NCDB data dictionary to compile terms. We compared rates of differing information between NCD terms of "Tumor Size" to T stage ("TNM We also compared group stage ("TNM_Stage Group") to N stage (TNM_CLIN_N"), M stage ("TNM_CLIN_M"). We further assessed differences between "TNM_CLIN_M" and listed sites of metastatic disease as well as comparing whether Radiation administrations was concordant with given doses.We evaluated the entire database with an initial size of N = 121,930 patients. After including only complete information only on TNM and tumor size criteria we evaluated concordance for 106262 patients. We noted marked discordant rates between T stage and Tumor size. Specifically, between tumors that were staged T1 did not meet correlate with their "Tumor Size" criteria in 12.5% of the time, T2 tumors did not meet size criteria in 8.5% of instances. Patients that were stage as M0 were only given a group stage of IV in 2.5% of instances while in 0.75% of instances was a group stage IV given when there was no identified metastatic site. For N stage, there was an error rate of approximately 2.6% between reported stage. With regard to "RAD Administered" there was a 1.7% error rate in dose being reported when it was listed that no radiation was administered.This demonstration serves to show that proper explanation of how the primary sample size is achieved is paramount. Depending on how one builds their data set (i.e., utilizing tumor size vs T1) could alter sample sizes and lead to the variance noted in NCDB studies evaluating similar questions. A comprehensive detailing of, or attachment of precise methodology can result in more overall transparency in analysis of large database oncology data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call