Representations Of Different Modalities Research Articles

AbstractAutomatic emotion recognition is a burgeoning field of research and has its roots in psychology and cognitive science. This article comprehensively reviews multimodal emotion recognition, covering various aspects such as emotion theories, discrete and dimensional models, emotional response systems, datasets, and current trends. This article reviewed 179 multimodal emotion recognition literature papers from 2017 to 2023 to reflect on the current trends in multimodal affective computing. This article covers various modalities used in emotion recognition based on the emotional response system under four categories: subjective experience comprising text and self‐report; peripheral physiology comprising electrodermal, cardiovascular, facial muscle, and respiration activity; central physiology comprising EEG, neuroimaging, and EOG; behavior comprising facial, vocal, whole‐body behavior, and observer ratings. This review summarizes the measures and behavior of each modality under various emotional states. This article provides an extensive list of multimodal datasets and their unique characteristics. The recent advances in multimodal emotion recognition are grouped based on the research focus areas such as emotion elicitation strategy, data collection and handling, the impact of culture and modality on multimodal emotion recognition systems, feature extraction, feature selection, alignment of signals across the modalities, and fusion strategies. The recent multimodal fusion strategies are detailed in this article, as extracting shared representations of different modalities, removing redundant features from different modalities, and learning critical features from each modality are crucial for multimodal emotion recognition. This article summarizes the strengths and weaknesses of multimodal emotion recognition based on the review outcome, along with challenges and future work in multimodal emotion recognition. This article aims to serve as a lucid introduction, covering all aspects of multimodal emotion recognition for novices.This article is categorized under: Fundamental Concepts of Data and Knowledge > Human Centricity and User Interaction Technologies > Cognitive Computing Technologies > Artificial Intelligence

Read full abstract

Cross-modality recognition has many important applications in science, law enforcement and entertainment. Popular methods to bridge the modality gap include reducing the distributional differences of representations of different modalities, learning indistinguishable representations or explicit modality transfer. The first two approaches suffer from the loss of discriminant information while removing the modality-specific variations. The third one heavily relies on the successful modality transfer, could face catastrophic performance drop when explicit modality transfers are not possible or difficult. To tackle this problem, we proposed a compact encoder-decoder neural module (cmUNet) to learn modality-agnostic representations while retaining identity-related information. This is achieved through cross-modality transformation and in-modality reconstruction, enhanced by an adversarial/perceptual loss which encourages indistinguishability of representations in the original sample space. For cross-modality matching, we propose MarrNet where cmUNet is connected to a standard feature extraction network which takes as inputs the modality-agnostic representations and outputs similarity scores for matching. We validated our method on five challenging tasks, namely Raman-infrared spectrum matching, cross-modality person re-identification and heterogeneous (photo-sketch, visible-near infrared and visible-thermal) face recognition, where MarrNet showed superior performance compared to state-of-the-art methods. Furthermore, it is observed that a cross-modality matching method could be biased to extract discriminant information from partial or even wrong regions, due to incompetence of dealing with modality gaps, which subsequently leads to poor generalization. We show that robustness to occlusions can be an indicator of whether a method can well bridge the modality gap. This, to our knowledge, has been largely neglected in the previous works. Our experiments demonstrated that MarrNet exhibited excellent robustness against disguises and occlusions, and outperformed existing methods with a large margin (>10%). The proposed cmUNet is a meta-approach and can be used as a building block for various applications.

Read full abstract

Representations Of Different Modalities Research Articles

Related Topics

Articles published on Representations Of Different Modalities

CE-DCVSI: Multimodal relational extraction based on collaborative enhancement of dual-channel visual semantic information

Multimodal emotion recognition: A comprehensive review, trends, and challenges

Automatic depression prediction via cross-modal attention-based multi-modal fusion in social networks

Consistency-constrained RGB-T crowd counting via mutual information maximization

Multi-Modal Latent Diffusion.

Mind the Gap: Learning Modality-Agnostic Representations With a Cross-Modality UNet.

Multimodal contrastive representation learning for drug-target binding affinity prediction

Multimodal Mutual Information Maximization: A Novel Approach for Unsupervised Deep Cross-Modal Hashing.

Semantic Alignment Network for Multi-Modal Emotion Recognition

Multi-modality cardiac image computing: A survey.

Towards Sustainable Safe Driving: A Multimodal Fusion Method for Risk Level Recognition in Distracted Driving Status

Privacy-preserving activity recognition using multimodal sensors in smart office

Multispectral-to-RGB Knowledge Distillation for Remote Sensing Image Scene Classification

Inner speech in meaning-making through verbal and artistic discourses

Latent feature representation learning for Alzheimer’s disease classification

A Survey on CLIP-Guided Vision-Language Tasks

Object-Level Visual-Text Correlation Graph Hashing for Unsupervised Cross-Modal Retrieval.

An efficient dual semantic preserving hashing for cross-modal retrieval

Learning Cross-Modal Common Representations by Private-Shared Subspaces Separation.

Learning Feature Representation and Partial Correlation for Multimodal Multi-Label Data

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Representations Of Different Modalities Research Articles

Related Topics

Articles published on Representations Of Different Modalities

CE-DCVSI: Multimodal relational extraction based on collaborative enhancement of dual-channel visual semantic information

Multimodal emotion recognition: A comprehensive review, trends, and challenges

Automatic depression prediction via cross-modal attention-based multi-modal fusion in social networks

Consistency-constrained RGB-T crowd counting via mutual information maximization

Multi-Modal Latent Diffusion.

Mind the Gap: Learning Modality-Agnostic Representations With a Cross-Modality UNet.

Multimodal contrastive representation learning for drug-target binding affinity prediction

Multimodal Mutual Information Maximization: A Novel Approach for Unsupervised Deep Cross-Modal Hashing.

Semantic Alignment Network for Multi-Modal Emotion Recognition

Multi-modality cardiac image computing: A survey.

Towards Sustainable Safe Driving: A Multimodal Fusion Method for Risk Level Recognition in Distracted Driving Status

Privacy-preserving activity recognition using multimodal sensors in smart office

Multispectral-to-RGB Knowledge Distillation for Remote Sensing Image Scene Classification

Inner speech in meaning-making through verbal and artistic discourses

Latent feature representation learning for Alzheimer’s disease classification

A Survey on CLIP-Guided Vision-Language Tasks

Object-Level Visual-Text Correlation Graph Hashing for Unsupervised Cross-Modal Retrieval.

An efficient dual semantic preserving hashing for cross-modal retrieval

Learning Cross-Modal Common Representations by Private-Shared Subspaces Separation.

Learning Feature Representation and Partial Correlation for Multimodal Multi-Label Data