Extracting lung cancer staging descriptors from pathology reports: A generative language model approach

Hyeongmin Cho,Sooyoung Yoo,Borham Kim,Sowon Jang,Leonard Sunwoo,Sanghwan Kim,Donghyoung Lee,Seok Kim,Sejin Nam,Jin-Haeng Chung

doi:10.1016/j.jbi.2024.104720

Abstract

BackgroundIn oncology, electronic health records contain textual key information for the diagnosis, staging, and treatment planning of patients with cancer. However, text data processing requires a lot of time and effort, which limits the utilization of these data. Recent advances in natural language processing (NLP) technology, including large language models, can be applied to cancer research. Particularly, extracting the information required for the pathological stage from surgical pathology reports can be utilized to update cancer staging according to the latest cancer staging guidelines. ObjectivesThis study has two main objectives. The first objective is to evaluate the performance of extracting information from text-based surgical pathology reports and determining pathological stages based on the extracted information using fine-tuned generative language models (GLMs) for patients with lung cancer. The second objective is to determine the feasibility of utilizing relatively small GLMs for information extraction in a resource-constrained computing environment. MethodsLung cancer surgical pathology reports were collected from the Common Data Model database of Seoul National University Bundang Hospital (SNUBH), a tertiary hospital in Korea. We selected 42 descriptors necessary for tumor-node (TN) classification based on these reports and created a gold standard with validation by two clinical experts. The pathology reports and gold standard were used to generate prompt-response pairs for training and evaluating GLMs which then were used to extract information required for staging from pathology reports. ResultsWe evaluated the information extraction performance of six trained models as well as their performance in TN classification using the extracted information. The Deductive Mistral-7B model, which was pre-trained with the deductive dataset, showed the best performance overall, with an exact match ratio of 92.24% in the information extraction problem and an accuracy of 0.9876 (predicting T and N classification concurrently) in classification. ConclusionThis study demonstrated that training GLMs with deductive datasets can improve information extraction performance, and GLMs with a relatively small number of parameters at approximately seven billion can achieve high performance in this problem. The proposed GLM-based information extraction method is expected to be useful in clinical decision-making support, lung cancer staging and research.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Extracting lung cancer staging descriptors from pathology reports: A generative language model approach

Abstract

Talk to us

Similar Papers

More From: Journal of Biomedical Informatics

Lead the way for us

Journal: Journal of Biomedical Informatics	Publication Date: Sep 1, 2024
License type: cc-by-nc-nd

Similar Papers

Automated Extraction of Tumor Staging and Diagnosis Information From Surgical Pathology Reports.
Sajjad Abedian ... Jonathan E Shoag
JCO Clinical Cancer Informatics | VOL. 5
Sajjad Abedian, et. al.Sajjad Abedian ... Jonathan E Shoag
01 Dec 2021
JCO Clinical Cancer Informatics | VOL. 5

What's New in Staging of Lung Cancer?
James R Jett
Chest | VOL. 111
James R JettJames R Jett
01 Jun 1997
Chest | VOL. 111

Facilitating cancer research using natural language processing of pathology reports.
Kristin Anderson ... Victor R Grann
Studies in health technology and informatics | VOL. 107
Kristin Anderson, et. al.Kristin Anderson ... Victor R Grann
25 Jun 2015
Studies in health technology and informatics | VOL. 107

Quality of Breast Cancer Surgical Pathology Reports
Anita Vallacha ... Dinesh Kumar
Asian Pacific Journal of Cancer Prevention : APJCP | VOL. 19
Anita Vallacha, et. al.Anita Vallacha ... Dinesh Kumar
01 Jan 2018
Asian Pacific Journal of Cancer Prevention : APJCP | VOL. 19

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Extracting lung cancer staging descriptors from pathology reports: A generative language model approach

Abstract

Talk to us

Similar Papers

More From: Journal of Biomedical Informatics