Abstract

T Human-Computer Interaction recently received an increased attention. This is not just a matter of making the operation of technical systems as simple as pissuble, but also to enable a possibly natural interaction. In this context, especially the speech-based operation gained an increased attention. For example, modern smart phones and televisions offer a robust voice control, which is attributed to various technical improvements in recent years. Nevertheless, voice control still seems artificial. Only self-contained dialogues with short statements can be managed. Furthermore, just the content of speech is evaluated. The way in which something is said remains unconsidered, although it is known from human communication, that the transmitted emotion is important in order to communicate successfully. A relatively new branch of research, the “Affective Computing”, has, amongst other objectives, the aim to develop technical systems that recognise and interpret emotions and respond to them appropriately. In this case, speech-based automatic emotion recognition has a major role. For emotion recognition, it is important to know how emotions can be presented and how they are expressed. For this purpose, it is helpful to rely on empirical evidences of the psychology of emotions. Unfortunately, there is no uniform representation of emotions. Also the definition of appropriate emotion-distinctive acoustic features is rather descriptive in psychology. Therefore, the automatic detection of emotions is based on proven methods of automatic speech recognition, which have also been shown as appropriate for emotion recognition. Automatic emotion recognition is, as speech recognition, a branch of pattern recognition. Contrary to emotion psychology, it is data-driven, that means insights are gathered from sampled data. For emotion recognition the phases “annotation”, “modelling” and “recognition” are distinguishable. The annotation categorises speech data according to predefined emotion-terms. Modelling generates recognisers to categorise data automatically. Recognition performs a previously unknown allocation of data to emotional classes. In the beginning, automatic emotion recognition was usually based – due to the lack of suitable data sets – on acted and very expressive emotional expressions. In this case, based on features and detection methods known from speech recognition, very good recognition results of over 80% in distinguishing of up to seven emotions could be achieved. However, for human-machine interaction these recognisers were unsuitable because in this case emotions are not that expressive. Therefore, in collaboration with

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call