Multimodal Fusion Architecture Research Articles

Oropharyngeal Squamous Cell Carcinoma (OPSCC) is one of the common forms of heterogeneity in head and neck cancer. Infection with human papillomavirus (HPV) has been identified as a major risk factor for OPSCC. Therefore, differentiating the HPV-positive and negative cases in OPSCC patients is an essential diagnostic factor influencing future treatment decisions. In this study, we investigated the accuracy of a deep learning-based method for image interpretation and automatically detected the HPV status of OPSCC in routinely acquired Computed Tomography (CT) and Positron Emission Tomography (PET) images. We introduce a 3D CNN-based multi-modal feature fusion architecture for HPV status prediction in primary tumor lesions. The architecture is composed of an ensemble of CNN networks and merges image features in a softmax classification layer. The pipeline separately learns the intensity, contrast variation, shape, texture heterogeneity, and metabolic assessment from CT and PET tumor volume regions and fuses those multi-modal features for final HPV status classification. The precision, recall, and AUC scores of the proposed method are computed, and the results are compared with other existing models. The experimental results demonstrate that the multi-modal ensemble model with soft voting outperformed single-modality PET/CT, with an AUC of 0.76 and F1 score of 0.746 on publicly available TCGA and MAASTRO datasets. In the MAASTRO dataset, our model achieved an AUC score of 0.74 over primary tumor volumes of interest (VOIs). In the future, more extensive cohort validation may suffice for better diagnostic accuracy and provide preliminary assessment before the biopsy.

Multimodal Land Cover Classification (MLCC) using the optical and Synthetic Aperture Radar (SAR) modalities has resulted in outstanding performances over using only unimodal data due to their complementary information on land properties. Previous multimodal deep learning (MDL) methods have relied on handcrafted multi-branch convolutional neural networks (CNN) to extract the features of different modalities and merged them for land cover classification. However, natural images-oriented handcrafted CNN models may not the optimal strategies to handle Remote Sensing (RS) image interpretation problems, due to the huge difference in terms of imaging angles and imaging ways. Furthermore, few MDL methods have analyzed optimal combinations of hierarchical features from different modalities. In this article, we propose an efficient multimodal architecture search framework, namely Multimodal Semantic Consistency-Based Fusion Architecture Search (M<sup>2</sup>SC-FAS) in continuous search space with the gradient-based optimization method, which can not only discover optimal optical- and SAR-specific architectures according to the different characteristics of the optical and SAR images, respectively, but also realizes the search of optimal multimodal dense fusion architecture. Specifically, the semantic-consistency constraint is introduced to guarantee dense fusion between hierarchical optical and SAR features with high semantic consistency and then capture the complementary performance on land properties. Finally, the basis of curriculum learning strategy is adopted on the M<sup>2</sup>SC-FAS. Extensive experiments show superior performances of our work on three broad co-registered optical and SAR datasets.

Multimodal Fusion Architecture Research Articles

Related Topics

Articles published on Multimodal Fusion Architecture

Spatiotemporal Sensitive Network for Non-Contact Heart Rate Prediction from Facial Videos

Multi-Modal Ensemble Deep Learning in Head and Neck Cancer HPV Sub-Typing.

Multi-modal fusion architecture search for camera-based semantic scene completion

Visual Sensing and Depth Perception for Welding Robots and Their Industrial Applications.

VLP2MSA: Expanding vision-language pre-training to multimodal sentiment analysis

Learning-Based Multimodal Information Fusion and Behavior Recognition of Vascular Interventionists' Operating Skills.

Trustworthy Deep Neural Network for Inferring Anticancer Synergistic Combinations.

Multi-cultural speech emotion recognition using language and speaker cues

Towards enhancing emotion recognition via multimodal framework

Fusion of Satellite Images and Weather Data With Transformer Networks for Downy Mildew Disease Detection

Multimodal sentiment analysis: A systematic review of history, datasets, multimodal fusion methods, applications, challenges and future directions

Learnable Depth-Sensitive Attention for Deep RGB-D Saliency Detection with Multi-modal Fusion Architecture Search

Multimodal Semantic Consistency-Based Fusion Architecture Search for Land Cover Classification

MMPC-RF: A Deep Multimodal Feature-Level Fusion Architecture for Hybrid Spam E-mail Detection

A novel multimodal fusion network based on a joint-coding model for lane line segmentation

WaveFusion Squeeze-and-Excitation: Towards an Accurate and Explainable Deep Learning Framework in Neuroscience.

MUFASA: Multimodal Fusion Architecture Search for Electronic Health Records

Attention-Based Multi-Modal Fusion Network for Semantic Scene Completion

Trimodal Attention Module for Multimodal Sentiment Analysis (Student Abstract)

Multimodal Deep Fusion Network for Visibility Assessment With a Small Training Dataset

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Multimodal Fusion Architecture Research Articles

Related Topics

Articles published on Multimodal Fusion Architecture

Spatiotemporal Sensitive Network for Non-Contact Heart Rate Prediction from Facial Videos

Multi-Modal Ensemble Deep Learning in Head and Neck Cancer HPV Sub-Typing.

Multi-modal fusion architecture search for camera-based semantic scene completion

Visual Sensing and Depth Perception for Welding Robots and Their Industrial Applications.

VLP2MSA: Expanding vision-language pre-training to multimodal sentiment analysis

Learning-Based Multimodal Information Fusion and Behavior Recognition of Vascular Interventionists' Operating Skills.

Trustworthy Deep Neural Network for Inferring Anticancer Synergistic Combinations.

Multi-cultural speech emotion recognition using language and speaker cues

Towards enhancing emotion recognition via multimodal framework

Fusion of Satellite Images and Weather Data With Transformer Networks for Downy Mildew Disease Detection

Multimodal sentiment analysis: A systematic review of history, datasets, multimodal fusion methods, applications, challenges and future directions

Learnable Depth-Sensitive Attention for Deep RGB-D Saliency Detection with Multi-modal Fusion Architecture Search

Multimodal Semantic Consistency-Based Fusion Architecture Search for Land Cover Classification

MMPC-RF: A Deep Multimodal Feature-Level Fusion Architecture for Hybrid Spam E-mail Detection

A novel multimodal fusion network based on a joint-coding model for lane line segmentation

WaveFusion Squeeze-and-Excitation: Towards an Accurate and Explainable Deep Learning Framework in Neuroscience.

MUFASA: Multimodal Fusion Architecture Search for Electronic Health Records

Attention-Based Multi-Modal Fusion Network for Semantic Scene Completion

Trimodal Attention Module for Multimodal Sentiment Analysis (Student Abstract)

Multimodal Deep Fusion Network for Visibility Assessment With a Small Training Dataset