Modulation Spectral Features Research Articles

In this paper, we propose a global approach for speech emotion recognition (SER) system using empirical mode decomposition (EMD). Its use is motivated by the fact that the EMD combined with the Teager-Kaiser Energy Operator (TKEO) gives an efficient time-frequency analysis of the non-stationary signals. In this method, each signal is decomposed using EMD into oscillating components called intrinsic mode functions (IMFs). TKEO is used for estimating the time-varying amplitude envelope and instantaneous frequency of a signal that is supposed to be Amplitude Modulation-Frequency Modulation (AM-FM) signal. A subset of the IMFs was selected and used to extract features from speech signal to recognize different emotions. The main contribution of our work is to extract novel features named modulation spectral (MS) features and modulation frequency features (MFF) based on AM-FM modulation model and combined them with cepstral features. It is believed that the combination of all features will improve the performance of the emotion recognition system. Furthermore, we examine the effect of feature selection on SER system performance. For classification task, Support Vecto Machine (SVM) and Recurrent Neural Networks (RNN) are used to distinguish seven basic emotions. Two databases- the Berlin corpus, and the Spanish corpus- are used for the experiments. The results evaluated on the Spanish emotional database, using RNN classifier and a combination of all features extracted from the IMFs enhances the performance of the SER system and achieving 91.16% recognition rate. For the Berlin database, the combination of all features using SVM classifier has 86.22% recognition rate.

Read full abstract

The current study presents an analysis of the robustness of a speech detector in real background sounds. One of the most important aspects of automatic speech/nonspeech classification is robustness in the presence of strongly varying external conditions. These include variations of the signal-to-noise ratio as well as fluctuations of the background noise. These variations are systematically evaluated by choosing different mismatched conditions between training and testing of the speech/nonspeech classifiers. The detection performance of the classifier with respect to these mismatched conditions is used as a measure of robustness and generalisation. The generalisation towards un-trained SNR conditions and unknown background noises is evaluated and compared to a matched baseline condition. The classifier consists of a feature front-end, which computes amplitude modulation spectral features (AMS), and a support vector machine (SVM) back-end. The AMS features are based on Fourier decomposition over time of short-term spectrograms. Mel-frequency cepstral coefficients (MFCC) as well as relative spectral features (RASTA) based on perceptual linear prediction (PLP) serve as baseline. The results show that RASTA-filtered PLP features perform best in the matched task. In the generalisation tasks however, the AMS features emerge as more robust in most cases, while MFCC features are outperformed by both other feature types. In a second set of experiments, a hierarchical approach is analysed which employs a background classification step prior to the speech/nonspeech classifier in order to improve the robustness of the detection scores in novel backgrounds. The background sounds used are recorded in typical everyday scenarios. The hierarchy provides a benefit in overall performance if the robust AMS features are employed. The generalisation capabilities of the hierarchy towards novel backgrounds and SNRs is found to be optimal when a limited number of training backgrounds is used (compared to the inclusion of all available background data). The best backgrounds in terms of generalisation capabilities are found to be backgrounds in which some component of speech (such as unintelligible background babble) is present, which corroborates the hypothesis that the AMS features provide a decomposition of signals which is by itself very suitable for training very general speech/nonspeech detectors. This is also supported by the finding that the SVMs combined with RASTA-PLPs require nonlinear kernels to reach a similar performance as the AMS patterns with linear kernels.

Read full abstract

Modulation Spectral Features Research Articles

Related Topics

Articles published on Modulation Spectral Features

Advances in Speech Emotion Recognition and Analysis: A Review of Applied Machine Learning Methodologies

COVID-19 Detection via Fusion of Modulation Spectrum and Linear Prediction Speech Features

Modulation spectral features for speech emotion recognition using deep neural networks

Contribution of Common Modulation Spectral Features to Vocal-Emotion Recognition of Noise-Vocoded Speech in Noisy Reverberant Environments

Novel feature representation using single frequency filtering and nonlinear energy operator for speech emotion recognition

Excitation Features of Speech for Emotion Recognition Using Neutral Speech as Reference

Audio based Emotion Detection and Recognizing Tool Using Mel Frequency based Cepstral Coefficient

Automatic speech emotion recognition using an optimal combination of features based on EMD-TKEO

Contribution of modulation spectral features on the perception of vocal-emotion using noise-vocoded speech

Study on the relationship between modulation spectral features and the perception of vocal emotion with noise-vocoded speech

Fusion of bottleneck, spectral and modulation spectral features for improved speaker verification of neutral and whispered speech

Modulation Spectral Features: In Pursuit of Invariant Representations of Music with Application to Unsupervised Source Identification

Residual Life Prediction of Rotating Machines Using Acoustic Noise Signals

Voice Pathology Detection and Discrimination Based on Modulation Spectral Features

Msf-Based Speaker Automatic Emotional Recognition In Continuous Chinese Mandarin

Discrimination of speech from nonspeeech in broadcast news based on modulation frequency features

Automatic speech emotion recognition using modulation spectral features

Robust speech detection in real acoustic backgrounds with perceptually motivated features

Modulation Spectral Features for Robust Far-Field Speaker Identification

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Modulation Spectral Features Research Articles

Related Topics

Articles published on Modulation Spectral Features

Advances in Speech Emotion Recognition and Analysis: A Review of Applied Machine Learning Methodologies

COVID-19 Detection via Fusion of Modulation Spectrum and Linear Prediction Speech Features

Modulation spectral features for speech emotion recognition using deep neural networks

Contribution of Common Modulation Spectral Features to Vocal-Emotion Recognition of Noise-Vocoded Speech in Noisy Reverberant Environments

Novel feature representation using single frequency filtering and nonlinear energy operator for speech emotion recognition

Excitation Features of Speech for Emotion Recognition Using Neutral Speech as Reference

Audio based Emotion Detection and Recognizing Tool Using Mel Frequency based Cepstral Coefficient

Automatic speech emotion recognition using an optimal combination of features based on EMD-TKEO

Contribution of modulation spectral features on the perception of vocal-emotion using noise-vocoded speech

Study on the relationship between modulation spectral features and the perception of vocal emotion with noise-vocoded speech

Fusion of bottleneck, spectral and modulation spectral features for improved speaker verification of neutral and whispered speech

Modulation Spectral Features: In Pursuit of Invariant Representations of Music with Application to Unsupervised Source Identification

Residual Life Prediction of Rotating Machines Using Acoustic Noise Signals

Voice Pathology Detection and Discrimination Based on Modulation Spectral Features

Msf-Based Speaker Automatic Emotional Recognition In Continuous Chinese Mandarin

Discrimination of speech from nonspeeech in broadcast news based on modulation frequency features

Automatic speech emotion recognition using modulation spectral features

Robust speech detection in real acoustic backgrounds with perceptually motivated features

Modulation Spectral Features for Robust Far-Field Speaker Identification