Abstract

64 Background: Routine clinical data from the electronic medical record are indispensable for retrospective and prospective observational studies and clinical trials. Their reproducibility is often not assessed. We sought to develop a prostate cancer-specific database with a defined source hierarchy for clinical annotations and to evaluate data reproducibility. Methods: At a comprehensive cancer center, we designed and implemented a clinical database for men with prostate cancer and clinical-grade paired tumor–normal sequencing for whom we performed team-based retrospective clinical data annotation from the electronic medical record, using a prostate cancer-specific data dictionary. We developed an open-source R package for data processing. We then evaluated completeness of data elements, reproducibility of team-based annotation using blinded repeat annotation by a medical oncologist as the reference, and the impact of measurement error on bias in survival analyses. Results: Data elements on demographics, diagnosis and staging, disease state at the time of procuring a genomically characterized sample, and clinical outcomes were piloted and then abstracted for 2,261 patients and their 2,631 genomically profiled samples. Completeness of data elements was generally high, between 55% to 99% for elements of clinical TNM staging, self-reported race, biopsy Gleason score, and presence of variant histologies, both for the team-based annotation and the repeat annotation. Comparing team-based annotation to the repeat annotation (100 patients/samples), reproducibility of annotations was high to very high. For 7 binary data elements, both sensitivity and specificity of the team-based annotation reached or exceeded 90%. The T stage, metastasis date, and presence and date of castration resistance had lower reproducibility. Impact of measurement error on estimates for strong prognostic factors was modest. Conclusions: With a prostate cancer-specific data dictionary and quality control measures, manual team-based annotations can be scalable and reproducible. The data dictionary and the R package for reproducible data processing tools provided (https://stopsack.github.io/prostateredcap) are freely available to help increase data quality in clinical prostate cancer research.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call