Motion Capture Dataset Research Articles

Existing methods for 3D tracking from monocular RGB videos predominantly consider articulated and rigid objects ( e.g. , two hands or humans interacting with rigid environments). Modelling dense non-rigid object deformations in this setting ( e.g. when hands are interacting with a face), remained largely unaddressed so far, although such effects can improve the realism of the downstream applications such as AR/VR, 3D virtual avatar communications, and character animations. This is due to the severe ill-posedness of the monocular view setting and the associated challenges ( e.g. , in acquiring a dataset for training and evaluation or obtaining the reasonable non-uniform stiffness of the deformable object). While it is possible to naïvely track multiple non-rigid objects independently using 3D templates or parametric 3D models, such an approach would suffer from multiple artefacts in the resulting 3D estimates such as depth ambiguity, unnatural intra-object collisions and missing or implausible deformations. Hence, this paper introduces the first method that addresses the fundamental challenges depicted above and that allows tracking human hands interacting with human faces in 3D from single monocular RGB videos. We model hands as articulated objects inducing non-rigid face deformations during an active interaction. Our method relies on a new hand-face motion and interaction capture dataset with realistic face deformations acquired with a markerless multi-view camera system. As a pivotal step in its creation, we process the reconstructed raw 3D shapes with position-based dynamics and an approach for non-uniform stiffness estimation of the head tissues, which results in plausible annotations of the surface deformations, hand-face contact regions and head-hand positions. At the core of our neural approach are a variational auto-encoder supplying the hand-face depth prior and modules that guide the 3D tracking by estimating the contacts and the deformations. Our final 3D hand and face reconstructions are realistic and more plausible compared to several baselines applicable in our setting, both quantitatively and qualitatively. https://vcai.mpi-inf.mpg.de/projects/Decaf

There has been a growing interest in multimodal sentiment analysis and emotion recognition in recent years due to its wide range of practical applications. Multiple modalities allow for the integration of complementary information, improving the accuracy and precision of sentiment and emotion recognition tasks. However, working with multiple modalities presents several challenges, including handling data source heterogeneity, fusing information, aligning and synchronizing modalities, and designing effective feature extraction techniques that capture discriminative information from each modality. This paper introduces a novel framework called “Attention-based Multimodal Sentiment Analysis and Emotion Recognition (AMSAER)” to address these challenges. This framework leverages intra-modality discriminative features and inter-modality correlations in visual, audio, and textual modalities. It incorporates an attention mechanism to facilitate sentiment and emotion classification based on visual, textual, and acoustic inputs by emphasizing relevant aspects of the task. The proposed approach employs separate models for each modality to automatically extract discriminative semantic words, image regions, and audio features. A deep hierarchical model is then developed, incorporating intermediate fusion to learn hierarchical correlations between the modalities at bimodal and trimodal levels. Finally, the framework combines four distinct models through decision-level fusion to enable multimodal sentiment analysis and emotion recognition. The effectiveness of the proposed framework is demonstrated through extensive experiments conducted on the publicly available Interactive Emotional Dyadic Motion Capture (IEMOCAP) dataset. The results confirm a notable performance improvement compared to state-of-the-art methods, attaining 85% and 93% accuracy for sentiment analysis and emotion classification, respectively. Additionally, when considering class-wise accuracy, the results indicate that the “angry” emotion and “positive” sentiment are classified more effectively than the other emotions and sentiments, achieving 96.80% and 93.14% accuracy, respectively.

Motion Capture Dataset Research Articles

Related Topics

Articles published on Motion Capture Dataset

DTP: learning to estimate full-body pose in real-time from sparse VR sensor measurements

Validity of muscle activation estimated with predicted ground reaction force in inverse dynamics based musculoskeletal simulation during gait

Lifting by Image – Leveraging Image Cues for Accurate 3D Human Pose Estimation

Decaf: Monocular Deformation Capture for Face and Hand Interactions

Markerless Motion Tracking With Noisy Video and IMU Data.

SparsePoser: Real-time Full-body Motion Reconstruction from Sparse Data

4D radar simulator for human activity recognition

Speech emotion recognition with light gradient boosting decision trees machine

Attention-based multimodal sentiment analysis and emotion recognition using deep neural networks

Multi-Level Attention-Based Categorical Emotion Recognition Using Modulation-Filtered Cochleagram

DeepCNN: Spectro‐temporal feature representation for speech emotion recognition

Speech emotion recognition based on emotion perception

Modeling Multiple Temporal Scales of Full-Body Movements for Emotion Classification

An Effective and Efficient Approach for 3D Recovery of Human Motion Capture Data.

IoT and Deep Learning-Based Farmer Safety System

A Music-Driven Deep Generative Adversarial Model for Guzheng Playing Animation.

Motion Capture and Tracking for Animation and Robotics

PERGAMO: Personalized 3D Garments from Monocular Video

Complexity of locomotion activities in an outside-of-the-lab wearable motion capture dataset.

BAT: Block and token self-attention for speech emotion recognition

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Motion Capture Dataset Research Articles

Related Topics

Articles published on Motion Capture Dataset

DTP: learning to estimate full-body pose in real-time from sparse VR sensor measurements

Validity of muscle activation estimated with predicted ground reaction force in inverse dynamics based musculoskeletal simulation during gait

Lifting by Image – Leveraging Image Cues for Accurate 3D Human Pose Estimation

Decaf: Monocular Deformation Capture for Face and Hand Interactions

Markerless Motion Tracking With Noisy Video and IMU Data.

SparsePoser: Real-time Full-body Motion Reconstruction from Sparse Data

4D radar simulator for human activity recognition

Speech emotion recognition with light gradient boosting decision trees machine

Attention-based multimodal sentiment analysis and emotion recognition using deep neural networks

Multi-Level Attention-Based Categorical Emotion Recognition Using Modulation-Filtered Cochleagram

DeepCNN: Spectro‐temporal feature representation for speech emotion recognition

Speech emotion recognition based on emotion perception

Modeling Multiple Temporal Scales of Full-Body Movements for Emotion Classification

An Effective and Efficient Approach for 3D Recovery of Human Motion Capture Data.

IoT and Deep Learning-Based Farmer Safety System

A Music-Driven Deep Generative Adversarial Model for Guzheng Playing Animation.

Motion Capture and Tracking for Animation and Robotics

PERGAMO: Personalized 3D Garments from Monocular Video

Complexity of locomotion activities in an outside-of-the-lab wearable motion capture dataset.

BAT: Block and token self-attention for speech emotion recognition