Abstract

In this paper, a novel model for synthesizing dance movements from music/audio sequence is proposed, which has variety of potential applications, e.g. virtual reality. For a given unheard song, in order to generate musically meaningful and natural dance movements, the following criteria should be met: 1) the rhythm between the dance action and music beat should be harmonious; 2) the generated dance movements should have notable and natural variations. Specifically, a sequence to sequence (Seq2Seq) learning architecture that leverages Long Short-Term Memory (LSTM) and Self-Attention mechanism (SA) is proposed for dance generation. The work in this article is interesting in the following aspects: 1) A cross-domain Seq2Seq learning framework is proposed for realistic dance generation; 2) A set of evaluation criterion is proposed for synthetization evaluation which do not have source for reference; 3) A dance dataset that including both music and corresponding dance motions collected, and very competitive results have been obtained against the-state-of-the-arts.

Highlights

  • There are many applications for sequence analysis based deep learning [1], [2], including language processing [3], video tracking [4], cross-domain analysis [5], [6], and semantic features based sentiment analysis [7]

  • MUSIC-DRIVEN DANCE GENERATION This paper aims to learn the mapping between music and dance movements, and synthesize musically meaningful and natural dance movements driven by music

  • 1) HUMAN EVALUATION RESULTS Table 1 presents the results of human evaluation. It shows that the scoring results of the two groups are basically the same, and each score of the expert group is basically lower than the ordinary group

Read more

Summary

Introduction

There are many applications for sequence analysis based deep learning [1], [2], including language processing [3], video tracking [4], cross-domain analysis [5], [6], and semantic features based sentiment analysis [7]. Cross-domain sequence analysis is one of the important branches. Cross-domain sequence analysis refers to finding the correspondence between two different types of sequences. Audio to video analysis is a special case of cross-domain sequence analysis [12], [13]. Comparing to the other topics, the research on audio-video analysis is relatively few. For a particular video scene, there may be multiple audio sequences corresponding to it; for a particular audio sequence, it can be used as background audio for multiple video scenes

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call