Effective modelling of human expressive states from voice by adaptively tuning the neuro-fuzzy inference system

Surjyo Narayana Panigrahi,Niharika Pattanaik,Hemanta Kumar Palo

doi:10.11591/ijai.v13.i1.pp185-194

Surjyo Narayana Panigrahi, Niharika Pattanaik + Show 1 more

Open Access

https://doi.org/10.11591/ijai.v13.i1.pp185-194

Copy DOI

Abstract

<span lang="EN-US">This paper aims to develop efficient speech-expressive models using the adaptively tuning neuro-fuzzy inference system (ANFIS). The developed models differentiate a high-arousal happiness state from a low-arousal sadness state from the benchmark Berlin (EMODB) database. The proposed low-cost flexible developed algorithms are self-tunable and can address several vivid real-world issues such as home tutoring, banking, and finance sectors, criminal investigations, psychological studies, call centers, cognitive and biomedical sciences. The work develops the proposed structures by formulating several novel feature vectors comprising both time and frequency information. The features considered are pitch (F0), the standard deviation of pitch (SDF0), autocorrelation coefficient (AC), log-energy (E), jitter, shimmer, harmonic to noise ratio (HNR), spectral centroid (SC), spectral roll-off (SR), spectral flux (SF), and zero-crossing rate (ZCR). to alleviate the issues of the curse of dimensionality associated with the frame-level extraction, the features are extracted at the utterance level. Several performance parameters have been computed to validate the individual time and frequency models. Further, the ANFIS models are tested for their efficacy in a combinational platform. The chosen features are complementary and the augmented vectors have indeed shown improved performance with more available information as revealed by our results.</span>

Full Text