Abstract
Despite the existence of established standards and guidelines for pathology reporting, many pathology reports are still written in unstructured free text. Extracting information from these reports and formatting it according to a standard is crucial for consistent interpretation. Automated information extraction from unstructured pathology reports is a challenging task, as it requires accurately interpreting medical terminologies and context-dependent details. In this work, we present a practical approach for automatically extracting information from unstructured pathology reports or scanned paper reports utilising a large multimodal model. This framework uses context-aware prompting strategies to extract values of individual fields, such as grade, size, etc. from pathology reports. A unique feature of the proposed approach is that it assigns a confidence value indicating the correctness of the model's extraction for each field and generates a structured report in line with national pathology guidelines in human and machine-readable formats. We have analysed the extraction performance in terms of accuracy and kappa scores, and the quality of the confidence scores assigned by the model. We have also evaluated the prognostic value of the extracted fields and feature embeddings of the raw text. Results showed that the model can accurately extract information with an accuracy and kappa score up to 0.99 and 0.98, respectively. Our results indicate that confidence scores are an effective indicator of the correctness of the extracted information achieving an area under the receiver operating characteristic curve up to 0.93 thus enabling automatic flagging of extraction errors. Our analysis further reveals that, as expected, information extracted from pathology reports is highly prognostically relevant. The framework demo is available at: https://labieb.dcs.warwick.ac.uk/. Information extracted from pathology reports of colorectal cancer cases in the cancer genome atlas using the proposed approach and its code are available at: https://github.com/EtharZaid/Labieb.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.