Abstract

The lack of vocal emotional expression is a major deficit in social communication disorders. The current scenario of artificial intelligence focuses on collaborative training of deep learning models without losing data privacy. The primary objective of this paper is to propose a federated learning-based classification model to identify and analyze the emotional capabilities of individuals with vocal emotion deficits. The methodology has developed a collaborative and privacy-preserved approach using federated learning for training the deep learning models. The proposed methodology utilizes Mel-frequency Cepstral Coefficients (MFCC) to preprocess audio recordings. The four datasets (RAVDESS, CREMA, TESS, SAVEE) including emotion-based classified audio recordings were collected from open sources. The collected audio recordings are 3 s each and the total data set has 668376 audio files with happy - 175119 files, sad – 172611 files, angry – 176346 files, and normal - 144300 files. Further, the input audio was pre-processed to generate MFCC features. The study began with extracting features from multiple pre-trained DL models as its base model. Then, the performance of the federated learning (FL) model was tested on independent and identically distributed (IID) and non-IID data. Further, this paper presents a federated deep learning-based multimodal system for verbal communication emotions classification that uses audio datasets to meet data privacy requirements by DL on the FL ecosystem. As per the findings, the federated learning trained model provides nearly similar parametric results in comparison to base model training. For IID data, the model had 99.71 % validation accuracy, precision (99.73 %), recall (99.69 %), and validation loss (0.01). The FL architecture with non-IID data outperformed these measures with validation accuracy (99.97 %), precision (99.97 %), recall (99.97 %), and least loss (0). Hence the acquired results support the utilization of federated learning ecosystem-based trained models with identically and non-identically distributed audio features from emotion identification without losing parametric results. In conclusion, the proposed techniques could be applied to identify verbal emotional deficits in individuals and could support developing emerging technological interventions for their well-being.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.