Abstract

Segmenting accurate thyroid nodules in medical ultrasound images is always non-trivial due to the large variation in size, shape and texture of nodules, and also the presence of surrounding tissues and organs with similar intensity values. Existing algorithms solve this by designing multi-scale spatial context modeling modules and introducing the emerging self-attention based paradigm, while failing to learn the class-wise relational context around nodules. To this end, we propose a novel coarse-to-fine framework, namely, CRSANet, which leverages both class-wise relational context and spatial multi-scale context for comprehensive context perception. Specifically, we first introduce the concept of Class Representations, which describe the overall representation of thyroid nodules in ultrasound images from a categorical perspective. Secondly, we propose a backbone-free dual-branch Class Representation Self-Attention Module (CRSA) to refine the coarse segmentation results from segmentation backbones. Concretely, Fine Class Region (FCR) branch is proposed to impose consistency between pixels and their corresponding Class Representations to correct the wrongly classified pixels. Meanwhile, Class Contextual Representation (CCR) branch is introduced to enhance the ability of pixel representations with the corresponding Class Representations. We evaluate the effectiveness and generality of CRSANet using three types of backbone networks. In the experiments, our proposed U-CRSANet achieves a mIoU of 78.86% and 81.35%, an accuracy of 97.11% and 98.63%, a Dice value of 74.26% and 87.57%, respectively, which outperforms most state-of-the-art methods on the public DDTI and TN-SCUI datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call