Abstract

Electronic medical records (EMR) hold potential for transformative improvements in the quality and efficiency of healthcare delivery. However, the unstructured nature of EMR data often necessitates manual review and is a significant barrier to leveraging it for downstream analysis. Methods to address this challenge include natural language processing (NLP) algorithms such as tokenization, shallow parsing, and boundary/negation detection. Recent radiation oncology (RO) specific attempts to use NLP to structure EMR data have involved identifying common toxicity terms in on-treatment visit (OTV) notes. Classical NLP works well for positive toxicity term identification (i.e., toxicity present) but is suboptimal for negated symptoms (i.e., toxicity absent). Convoluted neural networks (CNN) aid context detection, however, no publicly available RO-specific CNNs which identify toxicity terms exist. We hypothesized that RO-specific CNNs, naively trained on OTV notes, could improve tabulation of positive and negative toxicity information to an accuracy level suitable for automation.OTV notes (n = 3789) for prostate cancer (PCa) patients treated at our institution between 2019-2021 were identified and analyzed for inclusion/exclusion/omission of CTCAE toxicity terms. The top 5 terms identified (fatigue, nausea, diarrhea, dysuria, hematuria) were used for further analysis. Each note was manually classified by explicit positive identification, negation, or omission of each toxicity term, and used to train an in-house, toxicity-term-specific CNNs. Algorithms were structured as 3-group multiclass classification problem to distinguish between positive or negative symptom identification and omission of the term. Gold standard accuracy measurements were determined using manual review scores of OTV notes for the presence, absence or omission of toxicity terms in each OTV note. Overall, out of sample accuracy and F1 score were determined using a test/train split of OTV notes.The 3-class accuracy of CNNs for the top 5 CTCAE terms present/absent/negated in PCa OTV notes were: fatigue (accuracy = 0.93, F1 = 0.95), diarrhea (accuracy = 0.95, F1 = 0.94), nausea (accuracy = 0.98, F1 = 1.0), dysuria (accuracy = 0.97, F1 = 0.97), hematuria (accuracy = 0.99, F1 = 0.96).Training naïve CNNs with RO-specific training data from OTV notes increased the accuracy of CTCAE toxicity coding. This approach addresses challenges previously encountered using classical NLP from RO EMR data. Therefore, use of CNNs in NLP may reduce barriers to implementation of automated methods to improve data extraction for retrospective and prospective analyses.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.