Abstract

This study introduces a novel approach to emotion recognition by amalgamating information from heterogeneous modalities, specifically audio and video. We employed techniques such as energy, zero crossing rate, and Mel-Frequency Cepstral Coefficients (MFCC) for audio feature extraction, which showed promising results. For video feature extraction, spatialtemporal Gaussian kernels were used to organize video frames within a linear scale space, followed by the application of a Gaussian-weighted function to the second momentum matrix for further feature extraction. The Multimodal Feature Aggregation (MFA) fusion method was employed to unify audio and video features, resulting in a comprehensive dataset. Evaluation through the Fusion of Emotion Recognition Convolutional Neural Network (FERCNN) model, supported by the "TPU VM v3-8" accelerator TPU is a Tensor Processing Unit, showcased notable performance improvements. Using the RAVDESS and CREMAD datasets, accuracies of 94.66%, 95.82%, and 94.36% in the RAVDESS dataset and 79.45%, 96.62%, and 70.14% in the CREMAD dataset for audio, video, and multimodal modalities, respectively, were achieved. These outcomes surpass the capabilities of existing multimodal systems, underscoring the efficacy of our proposed approach. Emotion recognition, particularly through multimodal means, plays a critical role in various domains, including human-computer interfaces, healthcare, legal proceedings, and entertainment. Fusing Audio and Video Modalities to Elevate Human-Computer Interaction and Intelligent System Performance is essential for enhancing communication within these domains. The proposed model is termed "DualVision EmotionNet: DV EmotionNet".

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.