Abstract

The emergence of large real-world clinical databases and tools to mine electronic medical records has allowed for an unprecedented look at large data sets with clinical and epidemiologic correlates. In clinical cancer genetics, real-world databases allow for the investigation of prevalence and effectiveness of prevention strategies and targeted treatments and for the identification of barriers to better outcomes. However, real-world data sets have inherent biases and problems (eg, selection bias, incomplete data, measurement error) that may hamper adequate analysis and affect statistical power. Here, we leverage a real-world clinical data set from a large health network for patients with breast cancer tested for variants in BRCA1 and BRCA2 (N = 12,423). We conducted data cleaning and harmonization, cross-referenced with publicly available databases, performed variant reassessment and functional assays, and used functional data to inform a variant's clinical significance applying American College of Medical Geneticists and the Association of Molecular Pathology guidelines. In the cohort, White and Black patients were over-represented, whereas non-White Hispanic and Asian patients were under-represented. Incorrect or missing variant designations were the most significant contributor to data loss. While manual curation corrected many incorrect designations, a sizable fraction of patient carriers remained with incorrect or missing variant designations. Despite the large number of patients with clinical significance not reported, original reported clinical significance assessments were accurate. Reassessment of variants in which clinical significance was not reported led to a marked improvement in data quality. We identify the most common issues with BRCA1 and BRCA2 testing data entry and suggest approaches to minimize data loss and keep interpretation of clinical significance of variants up to date.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call