Deep Supervision Strategy Research Articles

Medical image segmentation aims at recognizing the object of interest from surrounding tissues and structures, which is essential for the reliable diagnosis and morphological analysis of specific lesions. Automatic medical image segmentation has been significantly boosted by deep Convolutional Neural Networks (CNNs). However, CNNs usually fail to model long-range interactions due to the intrinsic locality of convolutional operations, which limits the segmentation performance. Recently, Transformer has been successfully applied in various computer visions, which leverages the self-attention mechanism for modelling long-range interactions to capture global information. Nevertheless, self-attention suffers from lacks of spatial locality and efficient computation. To address these issues, in this work, we develop a new sparse medical Transformer (SMTF) with multiscale contextual fusion for medical image segmentation. The proposed model combines convolutional operations and attention mechanisms to form a U-shaped framework to capture both local and global information. Specifically, to reduce the computational cost of traditional Transformer, we design a novel sparse attention module to construct Transformer layers by spherical Locality Sensitive Hashing method. The sparse attention partitions the feature space into different attention buckets, and the attention calculation is conducted only in the individual bucket. The designed sparse Transformer layer further incorporates a bottleneck block to construct the encoder in SMTF. It is worth noting that the proposed sparse Transformer can also aggregate the global feature information in early stages, which enables the model to learn more local and global information by incorporating CNNs at lower layers. Furthermore, we introduce a deep supervision strategy to guide the model to fuse multiscale feature information. It further enables the SMTF to effectively propagate feature information across layers to preserve more input spatial information and mitigate information attenuation. Benefiting from these, it can achieve better segmentation performance while being more robust and efficient. The proposed SMTF is evaluated on multiple medical image segmentation datasets and a clinical nasopharyngeal carcinoma dataset. Extensive experiments have demonstrated its superiority on both qualitative and quantitative evaluations. Code and models are available at https://github.com/qmx717/sparse-attention.git.

PurposeAccurate segmentation of cardiac structures on coronary CT angiography (CCTA) images is crucial for the morphological analysis, measurement, and functional evaluation. In this study, we achieve accurate automatic segmentation of cardiac structures on CCTA image by adopting an innovative deep learning method based on visual attention mechanism and transformer network, and its practical application value is discussed.MethodsWe developed a dual‐input deep learning network based on visual saliency and transformer (VST), which consists of self‐attention mechanism for cardiac structures segmentation. Sixty patients’ CCTA subjects were randomly selected as a development set, which were manual marked by an experienced technician. The proposed vision attention and transformer mode was trained on the patients CCTA images, with a manual contour‐derived binary mask used as the learning‐based target. We also used the deep supervision strategy by adding auxiliary losses. The loss function of our model was the sum of the Dice loss and cross‐entropy loss. To quantitatively evaluate the segmentation results, we calculated the Dice similarity coefficient (DSC) and Hausdorff distance (HD). Meanwhile, we compare the volume of automatic segmentation and manual segmentation to analyze whether there is statistical difference.ResultsFivefold cross‐validation was used to benchmark the segmentation method. The results showed the left ventricular myocardium (LVM, DSC = 0.87), the left ventricular (LV, DSC = 0.94), the left atrial (LA, DSC = 0.90), the right ventricular (RV, DSC = 0.92), the right atrial (RA, DSC = 0.91), and the aortic (AO, DSC = 0.96). The average DSC was 0.92, and HD was 7.2 ± 2.1 mm. In volume comparison, except LVM and LA (p < 0.05), there was no significant statistical difference in other structures. Proposed method for structural segmentation fit well with the true profile of the cardiac substructure, and the model prediction results closed to the manual annotation.Conclusions The adoption of the dual‐input and transformer architecture based on visual saliency has high sensitivity and specificity to cardiac structures segmentation, which can obviously improve the accuracy of automatic substructure segmentation. This is of gr

Deep Supervision Strategy Research Articles

Articles published on Deep Supervision Strategy

Cloud Detection Using a UNet3+ Model with a Hybrid Swin Transformer and EfficientNet (UNet3+STE) for Very-High-Resolution Satellite Imagery

Gca-pvt-net: group convolutional attention and PVT dual-branch network for oracle bone drill chisel segmentation

Edge and dense attention U-net for atrial scar segmentation in LGE-MRI

LMFormer: Lightweight and multi-feature perspective via transformer for human pose estimation

PRT-Net: a progressive refinement transformer for dose prediction to guide ovarian transposition.

A novel non-pretrained deep supervision network for polyp segmentation

Transfer-Aware Graph U-Net with Cross-Level Interactions for PolSAR Image Semantic Segmentation

Progressive deep snake for instance boundary extraction in medical images

Deeply Supervised Skin Lesions Diagnosis With Stage and Branch Attention.

A Triplet Network Fusing Optical and SAR Images for Colored Steel Building Extraction.

FDR-TransUNet: A novel encoder-decoder architecture with vision transformer for improved medical image segmentation

An attention-guided network for bilateral ventricular segmentation in pediatric echocardiography

High‐resolution optical remote sensing image change detection based on dense connection and attention feature fusion network

SMTF: Sparse transformer with multiscale contextual fusion for medical image segmentation

TPFR-Net: U-shaped model for lung nodule segmentation based on transformer pooling and dual-attention feature reorganization.

OdeBERT: One-stage Deep-supervised Early-exiting BERT for Fast Inference in User Intent Classification

Context–content collaborative network for building extraction from high-resolution imagery

ACPA-Net: Atrous Channel Pyramid Attention Network for Segmentation of Leakage in Rail Tunnel Linings

What happens next? Combining enhanced multilevel script learning and dual fusion strategies for script event prediction

The auto segmentation for cardiac structures using a dual-input deep learning network based on vision saliency and transformer.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Deep Supervision Strategy Research Articles

Articles published on Deep Supervision Strategy

Cloud Detection Using a UNet3+ Model with a Hybrid Swin Transformer and EfficientNet (UNet3+STE) for Very-High-Resolution Satellite Imagery

Gca-pvt-net: group convolutional attention and PVT dual-branch network for oracle bone drill chisel segmentation

Edge and dense attention U-net for atrial scar segmentation in LGE-MRI

LMFormer: Lightweight and multi-feature perspective via transformer for human pose estimation

PRT-Net: a progressive refinement transformer for dose prediction to guide ovarian transposition.

A novel non-pretrained deep supervision network for polyp segmentation

Transfer-Aware Graph U-Net with Cross-Level Interactions for PolSAR Image Semantic Segmentation

Progressive deep snake for instance boundary extraction in medical images

Deeply Supervised Skin Lesions Diagnosis With Stage and Branch Attention.

A Triplet Network Fusing Optical and SAR Images for Colored Steel Building Extraction.

FDR-TransUNet: A novel encoder-decoder architecture with vision transformer for improved medical image segmentation

An attention-guided network for bilateral ventricular segmentation in pediatric echocardiography

High‐resolution optical remote sensing image change detection based on dense connection and attention feature fusion network

SMTF: Sparse transformer with multiscale contextual fusion for medical image segmentation

TPFR-Net: U-shaped model for lung nodule segmentation based on transformer pooling and dual-attention feature reorganization.

OdeBERT: One-stage Deep-supervised Early-exiting BERT for Fast Inference in User Intent Classification

Context–content collaborative network for building extraction from high-resolution imagery

ACPA-Net: Atrous Channel Pyramid Attention Network for Segmentation of Leakage in Rail Tunnel Linings

What happens next? Combining enhanced multilevel script learning and dual fusion strategies for script event prediction

The auto segmentation for cardiac structures using a dual-input deep learning network based on vision saliency and transformer.