Abstract

e13576 Background: Free-text clinical narratives contain rich patient information, which is labor-intensive to extract through chart review. We developed an NLP pipeline to enable automatic extraction of performance status (PS), staging, and diagnosis from clinical narratives from Veterans Affairs (VA) patients with lymphoid malignancies (LM). Methods: The rule-based NLP algorithm was developed and iteratively refined using a development corpus of 287 notes independently annotated by two clinicians. The F1-score for PS was 95.8 (precision 98.6, recall 93.2), 92.7 for staging (precision 94.0, recall 81.6), and 67 (precision 80.2, recall 57.9) for diagnosis. The NLP pipeline was then externally validated using an evaluation corpus of 97 notes from another group of 100 veterans with T-cell LM. Results: The results are reported. In the 97 notes, primary diagnosis was most routinely documented, with 2.76 mentions per note. In comparison, staging was most sparsely documented with only 34 mentions (note that 11 patients with large granular lymphocytic leukemia were not staged). The NLP pipeline performed relatively well in extracting PS and staging (F1-scores were 0.74 and 0.72, respectively). It also achieved high precision in extracting diagnosis information (precision 0.93). However, recall (0.44) for diagnosis was poor, likely due to the complexities and inconsistencies of how diagnoses are documented for LM. Frequency of documentation and performance in the external validation set. Conclusions: The pipeline shows promising performance on the external validation set, demonstrating the feasibility of using NLP to extract information from notes of patients with LM for clinical research. The NLP pipeline generally has lower recall than precision, indicating that the pipeline may miss clinical information that should be captured. FPs incorrectly capture entities that are easily confused with the clinical entities of interest, such as nutritional status versus performance status. Future work includes capturing more lexical variations and indicators of documentation, as well as contextual information, such as in which sections of notes elements are likely documented. In addition, we describe how diagnosis may exist as primary, secondary, and in a differential, and we are building an NLP-based classifier to distinguish between these types of diagnoses. We will use results from the rule-based NLP pipeline as labels to fine-tune transformer-based, weak-supervised models to further enhance the performance. [Table: see text]

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call