Abstract
11133 Background: Natural language processing (NLP) parsers are advanced algorithms designed to parse values of interest from free-text stored in Electronic Medical Records (EMR). Given the importance of information stored in free-text for use in oncology research, it is crucial to understand how accurately NLP algorithms are extracting oncology-specific measures of interest. We focus on validating NLP-parsers that identify three measures critical to oncology research by evaluating how closely the NLP-parser results match manual chart abstractor review: AJCC summary cancer stage group (cancer stage), AJCC TNM stage (TNM stage), and surgical treatment of cancer (surgery). Methods: Following deployment of the parsers on 8,000 non-identifiable free-text notes housed in an aggregated non-identifiable U.S. EMR database, manual chart abstraction was performed on a random selection of notes (n=300) dated between 2008 to 2023 to validate the parsers’ accurate capture of cancer stage, TNM stage, and surgical treatment of cancer. We report true positives (TP), false positives (FP) and positive percent value (PPV). Results: The PPV of AJCC Cancer Stage was 99.0% (n=297 TP, n=3 FP). The PPV of the TNM stage NLP-parser was 97.3% for tumor stage (n=292 TP, n=8 FP), 99.0% for nodal stage (n=297 TP, n=3 FP), and 96.7% for presence of metastases (n=290 TP, n=10 FP). The PPV of the surgery parser was 87.8% (n=468 TP, n=65 FP). Conclusions: These NLP-parsers performed very well identifying the measures of interest, providing confidence in use of these derived measures which are central to oncology research. Validation is a necessary initial step in processing real-world data for use in real-world evidence generation, with unique considerations needed in the oncology research space due to key information being documented in EMR free-text rather than structured data fields.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.