Fast sequential floating forward selection applied to emotional speech features estimated on DES and SUSAS data collections

Dimitrios Ververidis ,Constantine Kotropoulos

doi:10.5281/zenodo.39825

Abstract

In this paper, we classify speech into several emotional states based on the statistical properties of prosody features estimated on utterances extracted from Danish Emotional Speech (DES) and a subset of Speech Under Simulated and Actual Stress (SUSAS) data collections. The proposed novelties are in: 1) speeding up the sequential floating feature selection up to 60%, 2) applying fusion of decisions taken on short speech segments in order to derive a unique decision for longer utterances, and 3) demonstrating that gender and accent information reduce the classification error. Indeed, a lower classification error by 1% to 11% is achieved, when the combination of decisions is made on long phrases and an error reduction by 2%–11% is obtained, when the gender and the accent information is exploited. The total classification error reported on DES is 42.8%. The same figure on SUSAS is 46.3%. The reported human errors have been 32.3% in DES and 42% in SUSAS. For comparison purposes, a random classification would yield an error of 80% in DES and 87.5% in SUSAS, respectively.

Full Text