Abstract

Publicly available datasets - for example via cBioPortal for Cancer Genomics - could be a valuable source for benchmarks and comparisons with local patient records. However, such an approach is only valid if patient cohorts are comparable to each other and if the documentation is complete and sufficient. In this paper, records from exocrine pancreatic cancer patients documented in a local cancer registry are compared with two public datasets to calculate overall survival. Several data preprocessing steps were necessary to ensure comparability of the different datasets and a common database schema was created. Our assumption that the public datasets could be used to augment the data of the local cancer registry could not be validated, since the analysis on overall survival showed a significant difference. We discuss several reasons and explanations for this finding. So far, comparing different datasets with each other and drawing medical conclusions on such comparisons should be conducted with great caution.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call