Background and Objective:Colorectal cancer is one of the major causes of cancer death worldwide. Essential for prognosis and treatment planning, TNM staging offers critical insights into the advancement of colorectal cancer. However, manual TNM staging from colon magnetic resonance imaging (MRI) is a laborious and error prone process. This study introduces an automated text classification system for TNM staging of colon MRI images in Spanish. Methods:A dataset of 1319 Spanish colon MRI reports was collected and manually labeled with TNM staging. In order to automate the task of TNM staging, a multimodal system was proposed. The system is based on RoBERTa language model pre-trained on a combination of biomedical and clinical Spanish language corpora and uses Natural Language Processing (NLP) techniques to extract relevant categorical and numerical features from MRI reports. Results:The performance of the system was evaluated using different metrics and the results obtained are very promising: the best performance among the proposed systems reached 0.7464, 0.8792 and 0.6776 of macro F1-score for T, N and M respectively. Conclusions:This study demonstrates the feasibility of using a language model for automatic TNM staging based on Spanish clinical reports of colorectal cancer patients. The proposed system can be a useful tool to improve the efficiency and accuracy of colorectal cancer diagnosis.
Read full abstract