Automatic Quality Assessment of Speech-Driven Synthesized Gestures

Zhiyuan He,Peican Zhu

doi:10.1155/2022/1828293

Abstract

The automatic synthesis of realistic gestures has the ability to change the fields of animation, avatars, and communication agents. Although speech-driven synthetic gesture generation methods have been proposed and optimized, the evaluation system of synthetic gestures is still lacking. The current evaluation method still needs manual participation, but it is inefficient in the industry of synthetic gestures and has the interference of human factors. So we need a model that can construct an automatic and objective quantitative quality assessment of the synthesized gesture video. We noticed that recurrent neural networks (RNN) have advantages in modeling advanced spatiotemporal feature sequences, which are very suitable for use in the processing of synthetic gesture video data. Therefore, to build an automatic quality assessment system, we propose in our work a model based on Bi-LSTM and make a little adjustment on the attention mechanism in it. Also, the evaluation method is proposed and experiments are designed to prove that the improved model of the algorithm can complete the quantitative evaluation of synthetic gestures. At the same time, in terms of performance, the model has an improvement of about 20% compared to before the algorithm adjustment.

Full Text