Automated Extraction of Grade, Stage, and Quality Information From Transurethral Resection of Bladder Tumor Pathology Reports Using Natural Language Processing

Alexander P Glaser,Joshua J Meeks,Brian J Jordan,Jason Cohen,Anuj Desai,Philip Silberman

doi:10.1200/po.17.00128.2019.test

Abstract

PurposeBladder cancer is initially diagnosed and staged with a transurethral resection of bladder tumor (TURBT). Patient survival is dependent on appropriate sampling of layers of the bladder, but pathology reports are dictated as free text, making large-scale data extraction for quality improvement challenging. We sought to automate extraction of stage, grade, and quality information from TURBT pathology reports using natural language processing (NLP).MethodsPatients undergoing TURBT were retrospectively identified using the Northwestern Enterprise Data Warehouse. An NLP algorithm was then created to extract information from free-text pathology reports and was iteratively improved using a training set of manually reviewed TURBTs. NLP accuracy was then validated using another set of manually reviewed TURBTs, and reliability was calculated using Cohen’s κ.ResultsOf 3,042 TURBTs identified from 2006 to 2016, 39% were classified as benign, 35% as Ta, 11% as T1, 4% as T2, and 10% as isolated carcinoma in situ...

Full Text