Abstract

This paper proposes a novel speech emotion recognition (SER) framework for affective interaction between human and personal devices. Most of the conventional SER techniques adopt a speaker-independent model framework because of the sparseness of individual speech data. However, a large amount of individual data can be accumulated on a personal device, making it possible to construct speaker-characterized emotion models in accordance with a speaker adaptation procedure. In this study, to address problems associated with conventional adaptation approaches in SER tasks, we modified a representative adaptation technique, maximum likelihood linear regression (MLLR), on the basis of selective label refinement. We subsequently carried out the modified MLLR procedure in an online and iterative manner, using accumulated individual data, to further enhance the speaker-characterized emotion models. In the SER experiments based on an emotional corpus, our approach exhibited performance superior to that of conventional adaptation techniques as well as the speaker-independent model framework.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.