Intermodal Interactions Research Articles

Precise glioma segmentation from multi-parametric magnetic resonance (MR) images is essential for brain glioma diagnosis. However, due to the indistinct boundaries between tumor sub-regions and the heterogeneous appearances of gliomas in volumetric MR scans, designing a reliable and automated glioma segmentation method is still challenging. Although existing 3D Transformer-based or convolution-based segmentation networks have obtained promising results via multi-modal feature fusion strategies or contextual learning methods, they widely lack the capability of hierarchical interactions between different modalities and cannot effectively learn comprehensive feature representations related to all glioma sub-regions. To overcome these problems, in this paper, we propose a 3D hierarchical cross-modality interaction network (HCMINet) using Transformers and convolutions for accurate multi-modal glioma segmentation, which leverages an effective hierarchical cross-modality interaction strategy to sufficiently learn modality-specific and modality-shared knowledge correlated to glioma sub-region segmentation from multi-parametric MR images. In the HCMINet, we first design a hierarchical cross-modality interaction Transformer (HCMITrans) encoder to hierarchically encode and fuse heterogeneous multi-modal features by Transformer-based intra-modal embeddings and inter-modal interactions in multiple encoding stages, which effectively captures complex cross-modality correlations while modeling global contexts. Then, we collaborate an HCMITrans encoder with a modality-shared convolutional encoder to construct the dual-encoder architecture in the encoding stage, which can learn the abundant contextual information from global and local perspectives. Finally, in the decoding stage, we present a progressive hybrid context fusion (PHCF) decoder to progressively fuse local and global features extracted by the dual-encoder architecture, which utilizes the local-global context fusion (LGCF) module to efficiently alleviate the contextual discrepancy among the decoding features. Extensive experiments are conducted on two public and competitive glioma benchmark datasets, including the BraTS2020 dataset with 494 patients and the BraTS2021 dataset with 1251 patients. Results show that our proposed method outperforms existing Transformer-based and CNN-based methods using other multi-modal fusion strategies in our experiments. Specifically, the proposed HCMINet achieves state-of-the-art mean DSC values of 85.33% and 91.09% on the BraTS2020 online validation dataset and the BraTS2021 local testing dataset, respectively. Our proposed method can accurately and automatically segment glioma regions from multi-parametric MR images, which is beneficial for the quantitative analysis of brain gliomas and helpful for reducing the annotation burden of neuroradiologists.

Visual Question Answering (VQA) is a task that requires VQA model to fully understand the visual information of the image and the language information of the question, and then combine both to provide an answer. Recently, a large amount of VQA approaches focus on modeling intra- and inter-modal interactions with respect to vision and language using a deep modular co-attention network, which can achieve a good performance. Despite their benefits, they also have their limitations. First, the question representation is obtained through Glove word embeddings and Recurrent Neural Network, which may not be sufficient to capture the intricate semantics of the question features. Second, they mostly use visual appearance features extracted by Faster R-CNN to interact with language features, and they ignore important spatial relations between objects in images, resulting in incomplete use of image information. To overcome the limitations of previous methods, we propose a novel Multi-modal Spatial Relation Attention Network (MSRAN) for VQA, which can introduce spatial relationships between objects to fully utilize the image information, thus improving the performance of VQA. In order to achieve the above, we design two types of spatial relational attention modules to comprehensively explore the attention schemes: (i) Self-Attention based on Explicit Spatial Relation (SA-ESR) module that explores geometric relationships between objects explicitly; and (ii) Self-Attention based on Implicit Spatial Relation (SA-ISR) module that can capture the hidden dynamic relationships between objects by using spatial relationship. Moreover, the pre-training model BERT, which replaces Glove word embeddings and Recurrent Neural Network, is applied to MSRAN in order to obtain the better question representation. Extensive experiments on two large benchmark datasets, VQA 2.0 and GQA, demonstrate that our proposed model achieves the state-of-the-art performance.

Intermodal Interactions Research Articles

Articles published on Intermodal Interactions

A 3D hierarchical cross-modality interaction network using transformers and convolutions for brain glioma segmentation in MR images.

Prevention and adaptation of intermodal interactive seaports and dry ports under asymmetric risk behavior

Spatial-Temporal Co-Attention Learning for Diagnosis of Mental Disorders From Resting-State fMRI Data.

Joint training strategy of unimodal and multimodal for multimodal sentiment analysis

Follower attraction in live streaming: Knowledge driven by PKM and data driven by EF-LSTM

Hierarchical multimodal self-attention-based graph neural network for DTI prediction.

A Multimodal Sentiment Analysis Approach Based on a Joint Chained Interactive Attention Mechanism

Emotion-aware hierarchical interaction network for multimodal image aesthetics assessment

Nonlinear scaling of fluctuation kinetic energy for shock–vorticity wave interaction

A Dual-Attention Learning Network With Word and Sentence Embedding for Medical Visual Question Answering.

QueryTrack: Joint-Modality Query Fusion Network for RGBT Tracking.

Hybrid cross-modal interaction learning for multimodal sentiment analysis

Hierarchical Synergy-Enhanced Multimodal Relational Network for Video Question Answering

An attention-based multi-modal MRI fusion model for major depressive disorder diagnosis

A co-attention based multi-modal fusion network for review helpfulness prediction

Intangible cultural heritage image classification with multimodal attention and hierarchical fusion

Multi-modal spatial relational attention networks for visual question answering

Multimodal Sentiment Analysis Based on Attentional Temporal Convolutional Network and Multi-Layer Feature Fusion

Transformer-Based Multi-Modal Data Fusion Method for COPD Classification and Physiological and Biochemical Indicators Identification.

A Visually Enhanced Neural Encoder for Synset Induction

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Intermodal Interactions Research Articles

Articles published on Intermodal Interactions

A 3D hierarchical cross-modality interaction network using transformers and convolutions for brain glioma segmentation in MR images.

Prevention and adaptation of intermodal interactive seaports and dry ports under asymmetric risk behavior

Spatial-Temporal Co-Attention Learning for Diagnosis of Mental Disorders From Resting-State fMRI Data.

Joint training strategy of unimodal and multimodal for multimodal sentiment analysis

Follower attraction in live streaming: Knowledge driven by PKM and data driven by EF-LSTM

Hierarchical multimodal self-attention-based graph neural network for DTI prediction.

A Multimodal Sentiment Analysis Approach Based on a Joint Chained Interactive Attention Mechanism

Emotion-aware hierarchical interaction network for multimodal image aesthetics assessment

Nonlinear scaling of fluctuation kinetic energy for shock–vorticity wave interaction

A Dual-Attention Learning Network With Word and Sentence Embedding for Medical Visual Question Answering.

QueryTrack: Joint-Modality Query Fusion Network for RGBT Tracking.

Hybrid cross-modal interaction learning for multimodal sentiment analysis

Hierarchical Synergy-Enhanced Multimodal Relational Network for Video Question Answering

An attention-based multi-modal MRI fusion model for major depressive disorder diagnosis

A co-attention based multi-modal fusion network for review helpfulness prediction

Intangible cultural heritage image classification with multimodal attention and hierarchical fusion

Multi-modal spatial relational attention networks for visual question answering

Multimodal Sentiment Analysis Based on Attentional Temporal Convolutional Network and Multi-Layer Feature Fusion

Transformer-Based Multi-Modal Data Fusion Method for COPD Classification and Physiological and Biochemical Indicators Identification.

A Visually Enhanced Neural Encoder for Synset Induction