Hierarchical Fusion Research Articles

The tomato plant’s main-stem is a feasible lead for robotic searching the grows discretely-growing targets of harvesting, pruning or pollinating. Owing to the highlighted reflection characteristics of the main-stem in the near-infrared (NIR) waveband, this study proposes a multimodal hierarchical fusion method (YOLACTFusion) based on the attention mechanism, to achieve an instance segmentation of the main-stem from similar-colored differentiation (i.e., green leaf and green fruit) in robotic vision systems. The model inputs RGB images and 900–1100 nm NIR images into two ResNet50 backbone networks and uses a parallel attention mechanism to fuse feature maps of various scales together into the head network, to improve the segmentation performance of the main-stem of RGB images. The loss function for the multimodal image weights the original loss on the RGB image and the position offset loss and classification loss on the NIR image. Furthermore, the local depthwise separable convolution is used for the backbone network, and Conv-BN layers are merged to reduce the computational complexity. The results show that the precision and recall of YOLACTFusion of the main-stem detection, respectively reached 93.90 % and 62.60 %; and the precision and recall of instance segmentation reached 95.12 % and 63.41 %, respectively. Compared to YOLACT, the mean average precision (mAP) of YOLACTFusion is increased from 39.20 % to 46.29 %, the model size is reduced from 199.03 MB to 165.52 MB, while the image processing efficiency remains similar. The overall results show that the multimodal instance segmentation method proposed in this study significantly improves the detection and segmentation of tomato main-stems under a similar-colored background, which would be a potential method for improving agricultural robot’s visual perception.

In the real world, multimodal sentiment analysis (MSA) enables the capture and analysis of sentiments by fusing multimodal information, thereby enhancing the understanding of real-world environments. The key challenges lie in handling the noise in the acquired data and achieving effective multimodal fusion. When processing the noise in data, existing methods utilize the combination of multimodal features to mitigate errors in sentiment word recognition caused by the performance limitations of automatic speech recognition (ASR) models. However, there still remains the problem of how to more efficiently utilize and combine different modalities to address the data noise. In multimodal fusion, most existing fusion methods have limited adaptability to the feature differences between modalities, making it difficult to capture the potential complex nonlinear interactions that may exist between modalities. To overcome the aforementioned issues, this paper proposes a new framework named multimodal-word-refinement and cross-modal-hierarchy (MWRCMH) fusion. Specifically, we utilized a multimodal word correction module to reduce sentiment word recognition errors caused by ASR. During multimodal fusion, we designed a cross-modal hierarchical fusion module that employed cross-modal attention mechanisms to fuse features between pairs of modalities, resulting in fused bimodal-feature information. Then, the obtained bimodal information and the unimodal information were fused through the nonlinear layer to obtain the final multimodal sentiment feature information. Experimental results on the MOSI-SpeechBrain, MOSI-IBM, and MOSI-iFlytek datasets demonstrated that the proposed approach outperformed other comparative methods, achieving Has0-F1 scores of 76.43%, 80.15%, and 81.93%, respectively. Our approach exhibited better performance, as compared to multiple baselines.

Hierarchical Fusion Research Articles

Related Topics

Articles published on Hierarchical Fusion

HF2TNet: A Hierarchical Fusion Two-stage Training Network for Infrared and Visible Image Fusion

HiSIF-DTA: A Hierarchical Semantic Information Fusion Framework for Drug-Target Affinity Prediction.

Automated Diagnosis of Major Depressive Disorder With Multi-Modal MRIs Based on Contrastive Learning: A Few-Shot Study.

Indoor Point Cloud Segmentation Based on Hierarchical Fusion of Structural Features

HFA-GTNet: Hierarchical Fusion Adaptive Graph Transformer network for dance action recognition

Multimodal Mutual Attention-Based Sentiment Analysis Framework Adapted to Complicated Contexts

Space-time multi-level modeling for zooplankton abundance employing double data fusion and calibration

Hierarchical multimodal-fusion of physiological signals for emotion recognition with scenario adaption and contrastive alignment

Intangible cultural heritage image classification with multimodal attention and hierarchical fusion

Hierarchical fusion evaluation and optimization of radar intelligent tracking algorithm via hybrid weight design mechanism

A joint hierarchical cross‐attention graph convolutional network for multi‐modal facial expression recognition

MIA-Net: Multi-Modal Interactive Attention Network for Multi-Modal Affective Analysis

Knowledge Graph Augmented Network Towards Multiview Representation Learning for Aspect-Based Sentiment Analysis

Full stage networks with auxiliary focal loss and multi-attention module for submarine garbage object detection

Multi-modal hierarchical fusion network for fine-grained paper classification

YOLACTFusion: An instance segmentation method for RGB-NIR multimodal image fusion based on an attention mechanism

Hierarchical Fusion Network with Enhanced Knowledge and Contrastive Learning for Multimodal Aspect-Based Sentiment Analysis on Social Media.

Multimodal Sentiment Analysis in Realistic Environments Based on Cross-Modal Hierarchical Fusion Network

FURSformer: Semantic Segmentation Network for Remote Sensing Images with Fused Heterogeneous Features

STATE: Learning structure and texture representations for novel view synthesis

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Hierarchical Fusion Research Articles

Related Topics

Articles published on Hierarchical Fusion

HF2TNet: A Hierarchical Fusion Two-stage Training Network for Infrared and Visible Image Fusion

HiSIF-DTA: A Hierarchical Semantic Information Fusion Framework for Drug-Target Affinity Prediction.

Automated Diagnosis of Major Depressive Disorder With Multi-Modal MRIs Based on Contrastive Learning: A Few-Shot Study.

Indoor Point Cloud Segmentation Based on Hierarchical Fusion of Structural Features

HFA-GTNet: Hierarchical Fusion Adaptive Graph Transformer network for dance action recognition

Multimodal Mutual Attention-Based Sentiment Analysis Framework Adapted to Complicated Contexts

Space-time multi-level modeling for zooplankton abundance employing double data fusion and calibration

Hierarchical multimodal-fusion of physiological signals for emotion recognition with scenario adaption and contrastive alignment

Intangible cultural heritage image classification with multimodal attention and hierarchical fusion

Hierarchical fusion evaluation and optimization of radar intelligent tracking algorithm via hybrid weight design mechanism

A joint hierarchical cross‐attention graph convolutional network for multi‐modal facial expression recognition

MIA-Net: Multi-Modal Interactive Attention Network for Multi-Modal Affective Analysis

Knowledge Graph Augmented Network Towards Multiview Representation Learning for Aspect-Based Sentiment Analysis

Full stage networks with auxiliary focal loss and multi-attention module for submarine garbage object detection

Multi-modal hierarchical fusion network for fine-grained paper classification

YOLACTFusion: An instance segmentation method for RGB-NIR multimodal image fusion based on an attention mechanism

Hierarchical Fusion Network with Enhanced Knowledge and Contrastive Learning for Multimodal Aspect-Based Sentiment Analysis on Social Media.

Multimodal Sentiment Analysis in Realistic Environments Based on Cross-Modal Hierarchical Fusion Network

FURSformer: Semantic Segmentation Network for Remote Sensing Images with Fused Heterogeneous Features

STATE: Learning structure and texture representations for novel view synthesis