Image Semantic Information Research Articles

Semantic segmentation of urban street scenes has attracted much attention in the field of autonomous driving, which not only helps vehicles perceive the environment in real time, but also significantly improves the decision-making ability of autonomous driving systems. However, most of the current methods based on Convolutional Neural Network (CNN) mainly use coding the input image to a low resolution and then try to recover the high resolution, which leads to problems such as loss of spatial information, accumulation of errors, and difficulty in dealing with large-scale changes. To address these problems, in this paper, we propose a new semantic segmentation network (HRDLNet) for urban street scene images with high-resolution representation, which improves the accuracy of segmentation by always maintaining a high-resolution representation of the image. Specifically, we propose a feature extraction module (FHR) with high-resolution representation, which efficiently handles multi-scale targets and high-resolution image information by efficiently fusing high-resolution information and multi-scale features. Secondly, we design a multi-scale feature extraction enhancement (MFE) module, which significantly expands the sensory field of the network, thus enhancing the ability to capture correlations between image details and global contextual information. In addition, we introduce a dual-attention mechanism module (CSD), which dynamically adjusts the network to more accurately capture subtle features and rich semantic information in images. We trained and evaluated HRDLNet on the Cityscapes Dataset and the PASCAL VOC 2012 Augmented Dataset, and verified the model’s excellent performance in the field of urban streetscape image segmentation. The unique advantages of our proposed HRDLNet in the field of semantic segmentation of urban streetscapes are also verified by comparing it with the state-of-the-art methods.

Read full abstract

This article focuses on the task of Multi-Modal Summarization with Multi-Modal Output for China JD.COM e-commerce product description containing both source text and source images. In the context learning of multi-modal (text and image) input, there exists a semantic gap between text and image, especially in the cross-modal semantics of text and image. As a result, capturing shared cross-modal semantics earlier becomes crucial for multi-modal summarization. However, when generating the multi-modal summarization, based on the different contributions of input text and images, the relevance and irrelevance of multi-modal contexts to the target summary should be considered, so as to optimize the process of learning cross-modal context to guide the summary generation process and to emphasize the significant semantics within each modality. To address the aforementioned challenges, Multization has been proposed to enhance multi-modal semantic information by multi-contextually relevant and irrelevant attention alignment. Specifically, a Semantic Alignment Enhancement mechanism is employed to capture shared semantics between different modalities (text and image), so as to enhance the importance of crucial multi-modal information in the encoding stage. Additionally, the IR-Relevant Multi-Context Learning mechanism is utilized to observe the summary generation process from both relevant and irrelevant perspectives, so as to form a multi-modal context that incorporates both text and image semantic information. The experimental results in the China JD.COM e-commerce dataset demonstrate that the proposed Multization method effectively captures the shared semantics between the input source text and source images, and highlights essential semantics. It also successfully generates the multi-modal summary (including image and text) that comprehensively considers the semantics information of both text and image.

Read full abstract

Image Semantic Information Research Articles

Related Topics

Articles published on Image Semantic Information

Bi-Interfusion: A bidirectional cross-fusion framework with semantic-guided transformers in LiDAR-camera fusion

DCSS-UNet: UNet based on State Space Model for Polyp Segmentation

Multimodal Recipe Recommendation with Heterogeneous Graph Neural Networks

SAM-RSP: A new few-shot segmentation method based on segment anything model and rough segmentation prompts

HRDLNet: a semantic segmentation network with high resolution representation for urban street view images

Multization: Multi-Modal Summarization Enhanced by Multi-Contextually Relevant and Irrelevant Attention Alignment

Cherry growth modeling based on Prior Distance Embedding contrastive learning: Pre-training, anomaly detection, semantic segmentation, and temporal modeling

LR3S: A lightweight semantic segmentation model for road scenes based on improved DeepLabV3+

Application of Image Segmentation Algorithms in Computer Vision

A multi-label image classification method combining multi-stage image semantic information and label relevance

Region-Focused Network for Dense Captioning

Style-Enhanced Transformer for Image Captioning in Construction Scenes.

Multi-modal sarcasm detection based on Multi-Channel Enhanced Fusion model

Towards Unsupervised Referring Expression Comprehension with Visual Semantic Parsing

MTSTR: Multi-task learning for low-resolution scene text recognition via dual attention mechanism and its application in logistics industry.

Improving the Accuracy of Robot Collecting Organisms in Marine Environment Based on Yolov5 Improvement

AFCANet: An adaptive feature concatenate attention network for multi-focus image fusion

AdvMask: A sparse adversarial attack-based data augmentation method for image classification

A Semantic Information-Based Optimized vSLAM in Indoor Dynamic Environments

A DCT probability histogram-based ROI features for content-based natural and medical image retrieval applications

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Image Semantic Information Research Articles

Related Topics

Articles published on Image Semantic Information

Bi-Interfusion: A bidirectional cross-fusion framework with semantic-guided transformers in LiDAR-camera fusion

DCSS-UNet: UNet based on State Space Model for Polyp Segmentation

Multimodal Recipe Recommendation with Heterogeneous Graph Neural Networks

SAM-RSP: A new few-shot segmentation method based on segment anything model and rough segmentation prompts

HRDLNet: a semantic segmentation network with high resolution representation for urban street view images

Multization: Multi-Modal Summarization Enhanced by Multi-Contextually Relevant and Irrelevant Attention Alignment

Cherry growth modeling based on Prior Distance Embedding contrastive learning: Pre-training, anomaly detection, semantic segmentation, and temporal modeling

LR3S: A lightweight semantic segmentation model for road scenes based on improved DeepLabV3+

Application of Image Segmentation Algorithms in Computer Vision

A multi-label image classification method combining multi-stage image semantic information and label relevance

Region-Focused Network for Dense Captioning

Style-Enhanced Transformer for Image Captioning in Construction Scenes.

Multi-modal sarcasm detection based on Multi-Channel Enhanced Fusion model

Towards Unsupervised Referring Expression Comprehension with Visual Semantic Parsing

MTSTR: Multi-task learning for low-resolution scene text recognition via dual attention mechanism and its application in logistics industry.

Improving the Accuracy of Robot Collecting Organisms in Marine Environment Based on Yolov5 Improvement

AFCANet: An adaptive feature concatenate attention network for multi-focus image fusion

AdvMask: A sparse adversarial attack-based data augmentation method for image classification

A Semantic Information-Based Optimized vSLAM in Indoor Dynamic Environments

A DCT probability histogram-based ROI features for content-based natural and medical image retrieval applications