Abstract
In this study, a speech emotion recognition method that uses both acoustic and linguistic features is studied. Various emotion recognition methods using both the abovementioned types of features have been proposed. However, most studies that use linguistic features are based on reference transcripts because emotional speech recognition is considered more difficult than non-emotional speech recognition. The acoustic features of emotional speech differ from those of non-emotional speech, and these features vary greatly depending on the emotion type and intensity. We have been studying a new emotional speech recognition method that uses a combination of both acoustic model and language model adaptation and thereby achieved high recognition performance on an emotional speech task. In this study, we attempt to extract linguistic features using speech recognition results. The word recognition accuracy of the system was 82.2%, and recognition errors were observed. Despite this, the linguistic features extracted from the recognition results are useful, and we demonstrate that the combination of linguistic and acoustic features is effective for emotion recognition.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.