Segmentation prompts classification: A nnUNet-based 3D transfer learning framework with ROI tokenization and cross-task attention for esophageal cancer T-stage diagnosis

Chen Li,Runyuan Wang,Ping He,Wei Chen,Wei Wu,Yi Wu

doi:10.1016/j.eswa.2024.125067

Abstract

The computer-aided diagnosis system for esophageal cancer (EC) holds vital significance in EC diagnosis and treatment making, with a primary focus on accurate segmentation of EC-related organs and classification of EC’s T-stage. Above two tasks are closely related and crucial in assisting surgeon segment and diagnose cancer early. Note that this paradigm is still at its infancy and limited by closely related open issues: (1) how to link the complementary relationship between these two tasks and improve the originally poor performance? and (2) how to determine whether the tumor has invaded the surrounding muscle layers from CT images? Aiming at these issues, this study develops nn-TransEC, a 3D transfer learning framework that builds upon nnU-Net and synergizes segmentation and classification. nn-TransEC focuses on prompting fine-grained classification of EC’s T-stage with the aid of prior segmentation, which is implemented in two parts: (1) A nnUNet-configured multi-task learning network (nn-MTNet) is designed for complementary segmentation of EC-related organs and classification of EC’s T-stages with cross-task attention gates and transfer learning. (2) A knowledge-embedded ROI tokenization method (KRT) is defined to mimic the diagnostic workflow of doctors for classifying EC’s T-stage. KRT is implemented by cropping the most concerned regions from entire CT volume based on prior segmentation. Experiments have been conducted on a private dataset collected from 169 patients with confirmed EC through pathological diagnosis. Our proposed nn-TransEC is compared against the state-of-the-art counterparts (e.g., nnU-Net and nnFormer), and results demonstrate that: nn-TransEC excels in all compared methods in multi-organ segmentation and classification of EC’s T-stages, with 3D Dice of EC and average AUC of T-stages reaching 0.844 and 0.941, respectively. In contrast, the state-of-the-art method nnFormer achieves 0.814 and 0.927, respectively. Meanwhile, nn-TransEC also outperforms state-of-the-art multi-task learning models in joint segmentation and classification, with Hausdorff Distance of EC and average precision of T-stages reaching 8.497 and 0.845, respectively. In contrast, the state-of-the-art method TransMT-Net achieves 12.206 and 0.730, respectively.

Full Text