e13640 Background: The Making Advances in Mammography and Medical Options for Veterans Act of 2022 mandated that the Department of Veterans Affairs (VA) perform a study of germline genetic testing in Veterans with breast cancer (BC). Assessing whether a Veteran with BC was offered germline genetic testing based on electronic health records requires time-consuming review of unstructured clinical notes by clinical experts who are also extremely busy providing care. To expedite this review process, we implemented a productionized natural language processing (NLP) pipeline to identify relevant concepts and highlight them within a custom-made chart review interface. Methods: Clinical subject-matter experts (SMEs) were convened to identify keywords suggestive of genetic testing. These were categorized into 3 sets: gene-related concepts, VA-specific services, and exams and tests. We then used entity recognition and entity linking to link these concepts to a knowledge base. These were enriched with entities similarly extracted from an open-source reference clinical text on germline genetic testing in BC. The resulting concepts were visualized as a word cloud and reviewed by the SMEs. We implemented a Power BI and Power Apps interface to display clinical notes from 200 randomly selected Veterans, with relevant concepts highlighted. Each Veteran’s notes were reviewed independently by 2 of the 5 SMEs, who assessed whether the Veteran was offered testing, referred to a genetic counseling service, or otherwise had evidence of testing. Disagreements were adjudicated by the whole team. Results: 11622 notes (mean 58/Veteran, 3028 characters/note) were reviewed. Table shows the most prevalent entities among genes, mutations, and exam names. Of the 5 reviewers, 3 annotated 80 patients each in a single session, lasting less than 50 minutes on average. Reviewers had an inter-annotator agreement of 85.5%. Mention of target concepts was associated with germline genetic testing with an accuracy of 80.4%, precision of 76.4%, recall of 99.1%, and F1 of 86.3%. Germline genetic testing rates are detailed in a separate submission. Conclusions: Modern informatics tools such as productionized NLP, the Microsoft Power Platform, and high-performance computing environments allow for rapid implementation of health information technology tools that can enable clinicians to gain critical insights quickly and efficiently from electronic health record data. [Table: see text]
Read full abstract