Human-Machine Interaction Personalization: a Review on Gender and Emotion Recognition Through Speech Analysis

Monica La Mura,Patrizia Lamberti

doi:10.1109/metroind4.0iot48571.2020.9138203

Abstract

The increasing spread of pervasive technology has led to the fast development of human-centered connected systems, such as cloud-based voice services, assisted driving systems, domotics control systems, personal digital assistants. The user interacts with these systems by speaking to an artificial intelligence, which interprets the speaker's requests and takes decision accordingly. In such scenario, the real-time collection of personal information from the speaker's voice is a key-function to develop in order to offer personalized services. Gender is part of the basic information needed to customize the user experience. Furthermore, knowledge about the sex of the speaker also proves useful in automatic speaker recognition and voice-based identity recognition systems, since it restricts the search space to individuals of one gender, thus speeding up the system response. Therefore, gender recognition techniques through speech analysis have largely attracted the researchers' attention. Speech analysis is usually performed by extracting some features from the speech signal that can be affected by additional factors other than the gender: emotional state of the speaker, for example, is conveyed in the speech by altering some parameters that take part to the gender recognition process. At the same time, the outcome of emotion recognition systems based on speech analysis can be affected by the speaker's gender. This paper briefly summarizes the techniques used to perform gender recognition through speech analysis and proposes a practice to take gender into account in emotion recognition methods.

Full Text