Abstract
Limited studies have used natural language processing (NLP) in the context of non-small cell lung cancer (NSCLC). This study aimed to validate the application of an NLP model to an NSCLC cohort by extracting NSCLC concepts from free-text medical notes and converting them to structured, interpretable data. Patients with a lung neoplasm, NSCLC histology, and treatment information in their notes were selected from a repository of over 27 million patients. From these, 200 were randomly selected for this study with the longest and the most recent note included for each patient. An NLP model developed and validated on a large solid and blood cancer oncology cohort was applied to this NSCLC cohort. Two certified tumor registrars and a curator abstracted concepts from the notes: neoplasm, histology, stage, TNM values, and metastasis sites. This manually abstracted gold standard was compared with the NLP model output. Precision and recall scores were calculated. The NLP model extracted the NSCLC concepts with excellent precision and recall with the following scores, respectively: Lung neoplasm 100% and 100%, NSCLC histology 99% and 88%, histology correctly linked to neoplasm 98% and 79%, stage value 98.8% and 92%, stage TNM value 93% and 98%, and metastasis site 97% and 89%. High precision is related to a low number of false positives, and therefore, extracted concepts are likely accurate. High recall indicates that the model captured most of the desired concepts. This study validates that Optum's oncology NLP model has high precision and recall with clinical real-world data and is a reliable model to support research studies and clinical trials. This validation study shows that our nonspecific solid tumor and blood cancer oncology model is generalizable to successfully extract clinical information from specific cancer cohorts.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.