Abstract

In this paper, we investigate cross-lingual automatic speech emotion recognition. The basic idea is that since the emotion recognition system is based on the acoustic features only, it is possible to combine data in different languages to improve the recognition accuracy. We begin with the construction of a Mandarin database of emotional speech, which is similar to the well-known Berlin Database of Emotional Speech (EMO-DB) in the composition and size. In order to reduce the variability due to different languages and different speakers, we propose to apply histogram equalization as a data normalization method. Recognition systems based on support vector machines have been evaluated on EMO-DB. Compared to the baseline system without multi-lingual databases and data normalization, the proposed system has achieved a relative improvement of 39.9% in the emotion recognition accuracy, from 86.2% to 91.7%. The accuracy is among the best known results reported on EMO-DB, if not the best.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.