Abstract
In sophisticated Human-Computer Interfaces (HCI), the emotional state of the user is becoming a crucial component that is closely linked to emotional speech recognition. Spoken expressions, which can be a part of human-machine interaction, are an important source of emotional information. Speech emotion recognition (SER) in deep learning (DL) continues to be a hot topic, especially in the field of emotional computing. Current deep learning (DL) and neural network methods are applied in this highly active field of research. This is as a result of its expanding potential, advancements in algorithms, and practical uses. Quantitative factors such as pitch, intensity, accent and Mel-Frequency Cepstral Coefficients (MFCC) can be employed to model the paralinguistic data contained in human speech. To achieve SER, three key procedures are usually followed: data processing, feature selection/extraction, and classification based on the underlying emotional qualities. The nature of these processes and the peculiarities of human speech lend support to the employment of DL techniques for SER implementation. A variety of DL methods have been used for SER tasks in recent affective computing research works; however, only a small number of them capture the underlying ideas and methodologies that can be used to facilitate the three main steps of SER implementation. With a focus on the three SER implementation processes, we provide a state-of-the-art assessment of research conducted over the last ten years that tackled SER tasks from DL perspectives in this work. Various issues are covered in detail, including the problem of low classification accuracy of Speaker-Independent experiments and the related remedies. The review offers principles for SER evaluation as well, emphasizing indicators that can be experimented with and common baselines.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.