Abstract
Key variables recorded as text in colonoscopy and pathology reports have been extracted using natural language processing (NLP) tools that were not easily adaptable to new settings. We aimed to develop a reliable NLP tool with broad adaptability. During 1996-2016, Kaiser Permanente Northern California performed 401,566 colonoscopies with linked pathology. We randomly sampled 1000 linked reports into a Training Set and developed an NLP tool using SAS® PERL regular expressions. The NLP tool captured five colonoscopy and pathology variables: type, size, and location of polyps; extent of procedure; and quality of bowel preparation. We used a Validation Set (N = 3000) to confirm the variables' classifications using manual chart review as the reference. Performance of the NLP tool was assessed using the sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and Cohen's κ. Cohen's κ ranged from 93 to 99%. The sensitivity and specificity ranged from 95 to 100% across all categories. For categories with prevalence exceeding 10%, the PPV ranged from 97% to 100% except for adequate quality of preparation (prevalence 92%), for which the PPV was 65%. For categories with prevalence below 10%, the PPVs ranged from 62% to 100%. NPVs ranged from 94% to 100% except for the "complete" extent of procedure, for which the NPV was 73%. Using information from a large community-based population, we developed a transparent and adaptable NLP tool for extracting five colonoscopy and pathology variables. The tool can be readily tested in other healthcare settings.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.