Abstract

We provide a short review on the recent and near-future developments of computational processing of emotion in the voice, highlighting (a) self-learning of representations moving continuously away from traditional expert-crafted or brute-forced feature representations to end-to-end learning, (b) a movement towards the coupling of analysis and synthesis of emotional voices to foster better mutual understanding, (c) weakly supervised learning at a large scale, (d) transfer learning from related domains such as speech recognition or cross-modal transfer learning, and (e) reinforced learning through interactive applications at a large scale. For each of these trends, we shortly explain their implications and potential use such as for interpretation in psychological studies and usage in digital health and digital psychology applications. We also discuss further potential development.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call