Abstract

The goal of the work is to develop a web application dedicated to evaluating intonation in emotionally charged speech and singing, recorded according to a given scenario. The starting point is a systematic literature review concerning existing corpora of emotionally charged speech and singing and their accessibility. Then, based on the review performed, assumptions of the web-based application are shown, along with the choice of emotions expressed and the words/texts to be recorded. The realized corpus is characterized by its multimodality. Four modalities are used in recordings, i.e., audio, video, Facial Motion Capture (FMC) system sensors, and a high-speed camera. There are versions of recordings in the form of the audio signal and video signal, respectively: 25 fps (Canon), 120 fps (Vicon), and 200 fps (GoPro), audio and video signals combined, and recordings from the FMC sensor system (C3D files). Important is the fact that the corpus contains recordings by professional and amateur actors who expressed neutral emotion, joy, sadness, and anger. The recorded speech and singing signals are uploaded to a server and then prepared for the intonation analysis employing artificial neural networks with a deep architecture.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call