Abstract
e15567 Background: Retrospective investigations involving cancer treatment encounter the challenge of manually reviewing voluminous imaging reports to identify the best responses and the timing of disease progression. Here, we employed natural language processing, specifically deep bidirectional transformers, to automate the extraction of treatment response data. Methods: Data were sourced from the Yonsei Cancer Data Library database, focusing on patients with stage II-IV colorectal cancer at the Yonsei Cancer Center from Jan 1, 2010, to Dec 31, 2020. Imaging reports from 6,574 patients were gathered, amounting to 97,119 CT readings. Among these, 9,000 CT reports corresponding to 2,859 patients were randomly subjected to multilabel manual labeling by four radiology experts, based on the RECIST version 1.1 classification (CR/NED, PR, SD, PD). The pretrained BERT-base-uncased model was employed and fine-tuned for the downstream tasks of multilabel classification. 6,765 reports were used for training, while the remaining 1,000 reports were divided equally between the validation and test sets. Additionally, an algorithm for integrating and extracting treatment response data was also implemented. Clinical data were meticulously collected for 47 patients with stage IV early-onset colorectal cancer (EOCRC) receiving palliative first-line chemotherapy. The primary objective was to evaluate the model’s accuracy in predicting the timing of disease progression events within a ±30-day window. Secondary objectives focused on the model’s ability to identify the best response and its timing within a ±45-day window. Results: The evaluation of the classification accuracy across 1,000 individual CT reports, categorized as CR/NED (319), PR (86), SD (272), and PD (298), revealed accuracy with an AUROC of 0.956 and an F1 score of 0.823. All EOCRC patients were initially diagnosed with stage IV disease, and 45 out of 47 (95.7%) underwent metastasectomy either before initiating chemotherapy or during the first-line chemotherapy treatment. In clinical evaluation, the model demonstrated 72.3% (95% Confidence Interval [CI], 59.5-85.1) accuracy in predicting the day of disease progression within a ±30-day window. It achieved 55.3% (95% CI, 41.1-69.5) accuracy in predicting the best response category and its timing within ±45-days. Notably, the model was more accurate in predicting CR/NED at 72% compared to SD and PR, which had accuracy of 27.3% and 15.4%, respectively. Conclusions: Our model demonstrated high performance in classifying CT reports, which is expected to provide rapid insights into patient progress and improve clinical decision-making. However, the complexity involved in integrating and extracting data result in slight constraints in performance. This emphasizes the need for further research into multimodal large language models that can integrate a wider variety of clinical data.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.