Architecture of cross-platform videoconferencing system with automatic recognition of user emotions

Ilya Duboyskii,Oleg Sivchenko,Elizaveta Usina,Aleksandra Shabanova

doi:10.1088/1757-899x/918/1/012086

Abstract

This paper considers implementation of the technology of automated detection of emotional condition and video conferencing technology for remote content delivery, such as transport communication systems, polls, lectures, psychotherapy sessions, etc. To establish remote communication sessions a platform-agnostic peer-to-peer architecture was developed. Convolutional neural networks are used for stream processing at the operator end, being utilized to estimate the emotional feedback of the customer. To define the emotional condition, three modalities (video, audio, text), as well multimodal recognition were used. Experiments for 10 pairs of humans were performed, where one of them acted as an operator and asked closed questions, whereas another answered these questions. The neural network shows the following average accuracy values for the individual modalities: video 76 %, audio 57 %. The best output is ensured by multimodal recognition (average accuracy of 80 %). These findings confirm the efficiency of multimodal recognition in videoconferencing systems for classification of human emotions.

Full Text