Large Semantic Gap Research Articles

Objective. Medical image segmentation is significantly essential to assist clinicians in facilitating a quick and accurate diagnoses. However, most of the existing methods are still challenged by the loss of semantic information, blurred boundaries and the huge semantic gap between the encoder and decoder. Approach. To tackle these issues, a dual semantic aggregation transformer with dual attention is proposed for medical image segmentation. Firstly, the dual-semantic feature aggregation module is designed to build a bridge between convolutional neural network (CNN) and Transformer, effectively aggregating CNN’s local feature detail ability and Transformer’s long-range modeling ability to mitigate semantic information loss. Thereafter, the strip spatial attention mechanism is put forward to alleviate the blurred boundaries during encoding by constructing pixel-level feature relations across CSWin Transformer blocks from different spatial dimensions. Finally, a feature distribution gated attention module is constructed in the skip connection between the encoder and decoder to decrease the large semantic gap by filtering out the noise in low-level semantic information when fusing low-level and high-level semantic features during decoding. Main results. Comprehensive experiments conducted on abdominal multi-organ segmentation, cardiac diagnosis, polyp segmentation and skin lesion segmentation serve to validate the generalization and effectiveness of the proposed dual semantic aggregation transformer with dual attention (D-SAT). The superiority of D-SAT over current state-of-the-art methods is substantiated by both subjective and objective evaluations, revealing its remarkable performance in terms of segmentation accuracy and quality. Significance. The proposed method subtly preserves the local feature details and global context information in medical image segmentation, providing valuable support to improve diagnostic efficiency for clinicians and early disease control for patients. Code is available at https://github.com/Dxkm/D-SAT.

Read full abstract

Depth estimation is crucial for scene understanding and downstream tasks, especially the self-supervised training methods showing great potential. The overall structure and local details of the scene are essential for improving the quality of depth estimation. The proposal of Monodepth2 has led to significant progress in self-supervised monocular depth estimation. However, Monodepth2 uses the most basic encoder–decoder architecture. The limited data flow information of the network leads to a large semantic gap between the encoder and the decoder, which reduces the accuracy of the network for fine-grained feature recognition. Monodepth2 adopts Resnet18 pre-trained on the Imagenet dataset as the encoder. This traditional convolutional pooling structure results in a loss of pixel information in the network at every scale. In order to solve this problem, this paper proposes an improved DepthNet. The network adopts Hrnet in semantic segmentation as the base encoder, which adopts an advanced multi-scale fusion method in the whole process, thus avoiding the loss of pixel information. An additional densely connected U-Net is employed at the decoder side to provide more information flow. Furthermore, the semantic gap between the encoder and decoder is reduced by adding different numbers of residual connections and channel attention on each layer. The network structure can be regarded as a collection of fully convolutional networks. Since the deep features of the network have a higher correlation with the vertical position, we add a spatial location attention module to the deep-level network to reduce this semantic gap. The approach performs significantly well on the KITTI dataset benchmark, with several performance criteria comparable to supervised monocular depth inference methods.

Read full abstract

Large Semantic Gap Research Articles

Related Topics

Articles published on Large Semantic Gap

Query-Oriented Micro-Video Summarization.

PA-YOLO-Based Multifault Defect Detection Algorithm for PV Panels

Automatic Recognition of Agriculture Pests with Balanced Feature Pyramid Network

D-SAT: dual semantic aggregation transformer with dual attention for medical image segmentation

UCFilTransNet: Cross-Filtering Transformer-based network for CT image segmentation

Semantic Embedding Guided Attention with Explicit Visual Feature Fusion for Video Captioning

DCU-NET: Self-supervised monocular depth estimation based on densely connected U-shaped convolutional neural networks

Learning Weak Semantics by Feature Graph for Attribute-Based Person Search

CSBR: A Compositional Semantics-Based Service Bundle Recommendation Approach for Mashup Development

ForkNet: Strong Semantic Feature Representation and Subregion Supervision for Accurate Remote Sensing Change Detection

Discrete Semantics-Guided Asymmetric Hashing for Large-Scale Multimedia Retrieval

FLASH: Fast, Parallel, and Accurate Simulator for HLS

Multi-Res-Attention UNet: A CNN Model for the Segmentation of Focal Cortical Dysplasia Lesions from Magnetic Resonance Images

Learning-Based Computer-Aided Prescription Model for Parkinson's Disease: A Data-Driven Perspective.

Deep Reinforcement Polishing Network for Video Captioning

DF<sup>2</sup>Net: Discriminative Feature Learning and Fusion Network for RGB-D Indoor Scene Classification

Scene Semantic Understanding Based on the Spatial Context Relations of Multiple Objects

A novel dynamic multi-model relevance feedback procedure for content-based image retrieval

Category co-occurrence modeling for large scale scene recognition

Anomaly Detection via Midlevel Visual Attributes

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Large Semantic Gap Research Articles

Related Topics

Articles published on Large Semantic Gap

Query-Oriented Micro-Video Summarization.

PA-YOLO-Based Multifault Defect Detection Algorithm for PV Panels

Automatic Recognition of Agriculture Pests with Balanced Feature Pyramid Network

D-SAT: dual semantic aggregation transformer with dual attention for medical image segmentation

UCFilTransNet: Cross-Filtering Transformer-based network for CT image segmentation

Semantic Embedding Guided Attention with Explicit Visual Feature Fusion for Video Captioning

DCU-NET: Self-supervised monocular depth estimation based on densely connected U-shaped convolutional neural networks

Learning Weak Semantics by Feature Graph for Attribute-Based Person Search

CSBR: A Compositional Semantics-Based Service Bundle Recommendation Approach for Mashup Development

ForkNet: Strong Semantic Feature Representation and Subregion Supervision for Accurate Remote Sensing Change Detection

Discrete Semantics-Guided Asymmetric Hashing for Large-Scale Multimedia Retrieval

FLASH: Fast, Parallel, and Accurate Simulator for HLS

Multi-Res-Attention UNet: A CNN Model for the Segmentation of Focal Cortical Dysplasia Lesions from Magnetic Resonance Images

Learning-Based Computer-Aided Prescription Model for Parkinson's Disease: A Data-Driven Perspective.

Deep Reinforcement Polishing Network for Video Captioning

DF&lt;sup&gt;2&lt;/sup&gt;Net: Discriminative Feature Learning and Fusion Network for RGB-D Indoor Scene Classification

Scene Semantic Understanding Based on the Spatial Context Relations of Multiple Objects

A novel dynamic multi-model relevance feedback procedure for content-based image retrieval

Category co-occurrence modeling for large scale scene recognition

Anomaly Detection via Midlevel Visual Attributes

DF<sup>2</sup>Net: Discriminative Feature Learning and Fusion Network for RGB-D Indoor Scene Classification