Abstract

In this study, a speech emotion recognition method that uses both acoustic and linguistic features is studied. Various emotion recognition methods using both the abovementioned types of features have been proposed. However, most studies that use linguistic features are based on reference transcripts because emotional speech recognition is considered more difficult than non-emotional speech recognition. The acoustic features of emotional speech differ from those of non-emotional speech, and these features vary greatly depending on the emotion type and intensity. We have been studying a new emotional speech recognition method that uses a combination of both acoustic model and language model adaptation and thereby achieved high recognition performance on an emotional speech task. In this study, we attempt to extract linguistic features using speech recognition results. The word recognition accuracy of the system was 82.2%, and recognition errors were observed. Despite this, the linguistic features extracted from the recognition results are useful, and we demonstrate that the combination of linguistic and acoustic features is effective for emotion recognition.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call