Abstract

Histopathologic features are critical for studying risk factors of colorectal polyps, but remain deeply embedded within unstructured pathology reports, requiring costly and time-consuming manual abstraction for research. In this study, we developed and evaluated a natural language processing (NLP) pipeline to automatically extract histopathologic features of colorectal polyps from pathology reports, with an emphasis on individual polyp size. These data were then linked with structured electronic health record (EHR) data, creating an analysis-ready epidemiologic data set. We obtained 24,584 pathology reports from colonoscopies performed at the University of Utah's Gastroenterology Clinic. Two investigators annotated 350 reports to determine inter-rater agreement, develop an annotation scheme, and create a reference standard for performance evaluation. The pipeline was then developed, and performance was compared against the reference for extracting polyp location, histology, size, shape, dysplasia, and the number of polyps. Finally, the pipeline was applied to 24,225 unseen reports and NLP-extracted data were linked with structured EHR data. Across all features, our pipeline achieved a precision of 98.9%, a recall of 98.0%, and an F1-score of 98.4%. In patients with polyps, the pipeline correctly extracted 95.6% of sizes, 97.2% of polyp locations, 97.8% of histology, 98.3% of shapes, and 98.3% of dysplasia levels. When applied to unseen data, the pipeline classified 12,889 patients as having polyps, 4,907 patients without polyps, and extracted the features of 28,387 polyps. Tubular adenomas were the most common subtype (55.9%), 8.1% of polyps were advanced adenomas, and the mean polyp size was 0.57 (±0.4) cm. Our pipeline extracted histopathologic features of colorectal polyps from colonoscopy pathology reports, most notably individual polyp sizes, with considerable accuracy. This study demonstrates the utility of NLP for extracting polyp features and linking these data with EHR data to create an epidemiologic data set to study colorectal polyp risk factors and outcomes.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.