Automated Text Mining of Prostate Pathology Reports Extracted from an Electronic Medical System, using a Rule-Based Approach

R Karunamuni,V Nalawade,A Bruggeman,A.B Hopper,J.D Murphy,J.P Einck,B.S Rose

doi:10.1016/j.ijrobp.2018.07.877

R Karunamuni, V Nalawade + Show 5 more

Open Access

https://doi.org/10.1016/j.ijrobp.2018.07.877

Copy DOI

Abstract

Retrospective analysis of large-scale, prostate cancer databases to identify trends in cancer care and outcomes require detailed histopathologic characterization. Manual extraction of these characteristics from prostate pathology reports is a time-consuming process that is prone to human error. Our goal was to develop a rule-based software algorithm that was capable of automatically extracting pertinent characteristics (primary and secondary Gleason grade, maximum core involvement, number of positive and total cores) from prostate pathology reports. Prostate pathology reports were manually extracted from our institution’s electronic medical record system for 135 patients. The dataset was split into a training and testing set consisting of 110 and 25 patients, respectively. The training set was examined by hand to identify patterns and linguistic features that could be used to build the rules for extracting the clinical characteristics. During the training phase, it was noted that a different set of rules was required for outside slide reviews as these often presented the results in summary format as opposed to the core-by-core breakdown of those authored at our institution. These rules were then implemented in the statistical software platform, R. The performance of the algorithm was assessed in both training and testing sets by comparing the software predictions of the clinical characteristics to those made by a human observer. Of the 135 pathology reports, 29 were outside slide reviews (23 in training, and 6 in testing). The algorithm was able to correctly identify the primary and secondary Gleason grade, as well as the maximum core involvement percentage in 100% (135/135) of the data set. Due to ambiguity in reporting, four of the reports were excluded for the total core analysis, while two were excluded for the positive core analysis. The algorithm correctly identified the total cores in 95% (124/131) of the data set and the positive cores in 98% (131/133) of the data set. The testing set accuracy for the total and positive cores was 83% (19/23) and 96% (23/24), respectively. Analysis of errors revealed that four of the seven incorrectly identified total cores and both incorrectly identified positive cores were in outside slide reviews. The rule-based software algorithm was able to correctly extract and identify the primary and secondary Gleason grades, maximum core involvement, and number of positive and total cores in the majority of the examined prostate pathology reports. Standardized reporting, including a core-by-core breakdown, may lead to improved accuracy of text-mining algorithms and mitigate the need for human registrars.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Automated Text Mining of Prostate Pathology Reports Extracted from an Electronic Medical System, using a Rule-Based Approach

Abstract

Talk to us

Similar Papers

More From: International Journal of Radiation OncologyBiologyPhysics

Lead the way for us

Journal: International Journal of Radiation OncologyBiologyPhysics	Publication Date: Oct 20, 2018
Citations: 1

Similar Papers

Impact of an Expanded Definition of Family History on Outcomes of Active Surveillance for Prostate Cancer.
Adam C Schneider ... Chin-Lee Wu
Journal of Urology | VOL. 209
Adam C Schneider, et. al.Adam C Schneider ... Chin-Lee Wu
09 Mar 2023
Journal of Urology | VOL. 209

Racial variations in upgraded gleason scores of active surveillance candidates.
Krishna Pandya ... Allison H Feibus
Journal of Clinical Oncology | VOL. 35
Krishna Pandya, et. al.Krishna Pandya ... Allison H Feibus
20 Feb 2017
Journal of Clinical Oncology | VOL. 35

Adenocarcinoma of the Prostate with Gleason Score 9-10 on Core Biopsy: Correlation with Findings at Radical Prostatectomy and Prognosis
Carla L Ellis ... Jonathan I Epstein
Journal of Urology | VOL. 190
Carla L Ellis, et. al.Carla L Ellis ... Jonathan I Epstein
30 May 2013
Journal of Urology | VOL. 190

Diagnostic efficacy of image-guided core needle biopsy of suspected malignant osseous lesions: a retrospective cohort study from a single academic institution.
Winston L Winkler ... Jack W Jennings
European radiology | VOL. 34
Winston L Winkler, et. al.Winston L Winkler ... Jack W Jennings
22 Feb 2024
European radiology | VOL. 34

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automated Text Mining of Prostate Pathology Reports Extracted from an Electronic Medical System, using a Rule-Based Approach

Abstract

Talk to us

Similar Papers

More From: International Journal of Radiation Oncology*Biology*Physics

More From: International Journal of Radiation OncologyBiologyPhysics