Abstract

In terms of music-driven dance movement generation, the music movement matching model and the statistical mapping model have poor fit between the dance generated by the model and the music self. The generated dance movement is incomplete, and the smoothness and rationality of long-term dance sequences are low. The new dance moves and other related issues cannot be generated by the traditional model. In order to address these issues, we design a dance generation algorithm based on movements and neural networks that will extract the mapping between voice and movement features. In the first stage, where the prosody features and audio beat features extracted from music are used as music features, and the coordinates of key points of the human body extracted from dance videos are used as motion features for training. In the second stage, the basic mapping of music and dance movements are realized through the generator module of the model to generate a smooth dance posture; the consistency of dance and music is realized through the discriminator module; the audio characteristics are more possessed through the Autoencoder module representative. In the third and final stage, the modified version of the model transforms the dance posture sequence into a realistic version of the dance. Finally, a realistic version of the dance that fits the music is obtained. The experimental data is obtained from dance videos on the Internet, and the experimental results are analyzed from five aspects: loss function value, comparison of different baselines, evaluation of sequence generation effect, user research, and quality evaluation of real-life dance videos. The results show that the proposed dance generation model has a good effect in transforming into realistic dance videos.

Highlights

  • With the development and popularization of deep learning in recent years, artificial neural networks have been successfully applied to the generation of dance movements

  • For the visualization of dance, people often draw the skeleton of the human body or perform animation processing directly according to the coordinates of the key points of the human body, and there is room for further improvement in the visualization effect

  • It is necessary to extract continuous dance pose data from a specific dance video using human pose estimation technology, and design a specific audio feature encoder to extract audio features from the music matched by the dance. e dance data reflects the changes in the coordinates of the key points of the human body in different times

Read more

Summary

Related Work

Current Research Status. e cross-modal generation from audio to video can be divided into three categories: body motion generation, audio-driven image generation, and talking face video generation. Synthesizing the corresponding face video through speech or music is a typical cross-modal generation task. E other is conditional GAN, which is used to generate facial images based on a given set of lip marks Together, these two networks can generate a natural speaking face sequence synchronized with the input audio track. Eir model is trained in a self-supervised manner by using naturally aligned audio and video features in the video Another type of cross-modal generation task is to generate corresponding speech videos from voice or text end-to-end without the intervention of specified rules. Some researchers considered combining acoustic analysis with text [10], demonstrating a method of generating 3D virtual humans from audio signals by inferring the acoustic and semantic characteristics of speech. Some other researchers have realized the speech of any given speaker through self-supervised training in speech video [11], generated the corresponding speech posture without adding any semantic information, and synthesized realistic speech video

Method
Encoder Design
Experiments and Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call