Abstract
11165 Background: The main objective of an EHR is to effectively standardize clinical workflow. However, EHR data is also utilized extensively as secondary data for clinical research and quality improvement. Healthcare data exists in different forms such as paper, digital images, and notes, only some of which are recorded as structured data in the EHR. It is therefore essential to curate detailed information from unstructured data so that the patient’s journey can be understood in depth for research and quality studies. Methods: Patients diagnosed with metastatic breast cancer (mBC) between 01-Jan-2020 and 31-Dec-2022 were identified from the Integra Connect PrecisionQ de-identified database of 3 million cancer patients across 500 sites of care across ~80% community oncology and ~20% academic practices. Manual curation was conducted for a sample of these patients and the fill rates of crucial elements were captured as part of this study. Results: A total of 13,763 mBC patients were identified during this study period. The availability of data for the different variables was assessed from structured data in 13,120 patients. Additional information was obtained by manual curation for 643 patients. Information on staging and grade were available for 62.8% in the structured data, while curation increased the availability of these details to 99.5% among those 643 patients. The fill rates for patients’ tumor size and nodal status, as defined by the T and N values, were found to be 99% in the curated data compared to 60% in the structured data. Similarly, the fill rates were more than 98% for estrogen, progesterone, and HER2 receptor by curation, while they were only approximately 65% in the structured data. HER2 low, HER2-ve, and HER2+ve status was identified in 8%, 45.2%, and 10.1% by structured data and in 57.1%, 84.1%, and 14.0%, respectively, by curated data. Furthermore, the fill rates via curation were 97.1% for HER2 immunohistochemistry tests compared to 49.0% for structured data, while the fill rates for in-situ hybridization (ISH) or fluorescent in-situ hybridization (FISH) tests were 61.3% for curated data compared to 28.4% for structured data. This discrepancy is caused by the fact that IHC and FISH/ISH tests are primarily found only in PDF format. Conclusions: This study highlights the need for curation in order to maximize the utilization of EHR data for secondary research purposes and quality studies. Natural language processing and augmented curation methods can further enhance the quality of EHR data for secondary research.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have