A Review on Five Recent and Near-Future Developments in Computational Processing of Emotion in the Human Voice

Dagmar M Schuller,Björn W Schuller

doi:10.1177/1754073919898526

Abstract

We provide a short review on the recent and near-future developments of computational processing of emotion in the voice, highlighting (a) self-learning of representations moving continuously away from traditional expert-crafted or brute-forced feature representations to end-to-end learning, (b) a movement towards the coupling of analysis and synthesis of emotional voices to foster better mutual understanding, (c) weakly supervised learning at a large scale, (d) transfer learning from related domains such as speech recognition or cross-modal transfer learning, and (e) reinforced learning through interactive applications at a large scale. For each of these trends, we shortly explain their implications and potential use such as for interpretation in psychological studies and usage in digital health and digital psychology applications. We also discuss further potential development.

Full Text