Clinical accuracy of information extracted from prostate needle biopsy pathology reports using natural language processing.

Risa Liang Wong,Jacob Hoffman,Medha Sagar,Angelica Lerma,John L Gore,Claire Huang,Diboro Kanabolo,Joshua Caldwell

doi:10.1200/jco.2021.39.15_suppl.1557

Abstract

1557 Background: Patients with prostate cancer are diagnosed through a prostate needle biopsy (PNB). Information contained in PNB pathology reports is critical for informing clinical risk stratification and treatment; however, patient comprehension of PNB pathology reports is low, and formats vary widely by institution. Natural language processing (NLP) models trained to automatically extract key information from unstructured PNB pathology reports could be used to generate personalized educational materials for patients in a scalable fashion and expedite the process of collecting registry data or screening patients for clinical trials. As proof of concept, we trained and tested four NLP models for accuracy of information extraction. Methods: Using 403 positive PNB pathology reports from over 80 institutions, we converted portable document formats (PDFs) into text using the Tesseract optical character recognition (OCR) engine, removed protected health information using the Philter open-source tool, cleaned the text with rule-based methods, and annotated clinically relevant attributes as well as structural attributes relevant to information extraction using the Brat Rapid Annotation Tool. Text pre-processing for classification and extraction was done using Scispacy and rule-based methods. Using a 75:25 train:test split (N = 302, 101), we tested conditional random field (CRF), support vector machine (SVM), bidirectional long-short term memory network (Bi-LSTM), and Bi-LSTM-CRF models, reserving 46 training reports as a validation subset for the latter two models. Model-extracted variables were compared with values manually obtained from the unprocessed PDF reports for clinical accuracy. Results: Clinical accuracy of model-extracted variables is reported in the Table. CRF was the highest performing model, with accuracies of 97% for Gleason grade, 82% for percentage of positive cores ( < 50% vs. ≥50%), 90% for perineural or lymphovascular invasion, and 100% for presence of non-acinar carcinoma histology. On manual review of inaccurate results, model performance was limited by PDF image quality, errors in OCR processing of tables or columns, and practice variability in reporting number of biopsy cores. Conclusions: Our results demonstrate successful proof of concept for the use of NLP models in accurately extracting information from PNB pathology reports, though further optimization is needed before use in clinical practice.[Table: see text]

Full Text