Abstract

Determining surveillance intervals for patients with colorectal polyps is critical but time-consuming and challenging to do reliably. We present the development and assessment of a pipeline that leverages natural language processing techniques to automatically extract and analyze relevant polyp findings from free-text colonoscopy and pathology reports. Using this information, we categorized individual patients into 6 postcolonoscopy surveillance intervals defined by the U.S. Multi-Society Task Force on Colorectal Cancer. Using a set of 546 randomly selected colonoscopy and pathology reports from 324 patients in a single health system, we used a combination of statistical classifiers and rule-based methods to extract polyp properties from each report type, associate properties with unique polyps, and classify a patient into 1 of 6 risk categories by integrating information from both report types. We then assessed the pipeline's performance by determining the positive predictive value (PPV), sensitivity, and F-score of the algorithm, compared with the determination of surveillance intervals by a gastroenterologist. The pipeline was developed using 346 reports (224 colonoscopy and 122 pathology) from 224 patients and evaluated on an independent test set of 200 reports (100 colonoscopy and 100 pathology) from 100 patients. We achieved an average PPV, sensitivity, and F-score of .92, .95, and .93, respectively, across targeted entities for colonoscopy. Pathology extraction achieved a PPV, sensitivity, and F-score of .95, .97, and .96. The system achieved an overall accuracy of 92% in assigning the recommended interval for surveillance colonoscopy. This study demonstrates the feasibility of using machine learning to automatically extract findings and classify patients to appropriate risk categories and corresponding surveillance intervals. Incorporating this system can facilitate proactive and timely follow-up after screening colonoscopy and enable real-time quality assessment of prevention programs and providers.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call