Music2Dance: DanceNet for Music-Driven Dance Generation
Synthesize human motions from music (i.e., music to dance) is appealing and has attracted lots of research interests in recent years. It is challenging because of the requirement for realistic and complex human motions for dance, but more importantly, the synthesized motions should be consistent with the style, rhythm, and melody of the music. In this article, we propose a novel autoregressive generative model, DanceNet, to take the style, rhythm, and melody of music as the control signals to generate 3D dance motions with high realism and diversity. Due to the high long-term spatio-temporal complexity of dance, we propose the dilated convolution to improve the receptive field, and adopt the gated activation unit as well as separable convolution to enhance the fusion of motion features and control signals. To boost the performance of our proposed model, we capture several synchronized music-dance pairs by professional dancers and build a high-quality music-dance pair dataset. Experiments have demonstrated that the proposed method can achieve state-of-the-art results.
- Dissertation
- 10.12681/eadd/28428
- Mar 1, 2009
The efficient and reliable human-centred design of products and processes is a major goal of the manufacturing industry. Thus, numerous aspects related to performance, safety and ergonomics, need to be verified using Simulation and Virtual Reality techniques, in the context of the product development procedure. The realistic and accurate representation of human motion in Virtual Environment is crucial for the reliability of the simulation results. In this context, this dissertation focuses on the design and development of a novel methodology for human motion modelling, based on the adaptation of a given motion of a digital human model to new anthropometrics and environment’s constraints (related to virtual prototype or workspace). The proposed approach aims at the generation of realistic and reliable digital human motions in order to drive computer manikins into a Virtual Environment, so as to obtain reliable evaluation results during ergonomic design of a product or a production line’s workspace. The introductory chapter presents both the importance and the limitations of the ergonomic design using computer manikins, which consisted the major motivation of this research work. State-of-the-art is presented next, concerning other approaches related to human motion modeling using computer manikins, as well as software tools for digital human modeling and ergonomic design. Next chapter presents an extensive analysis, which focuses on the better understanding of the human motion. This analysis is based on a Statistical Design Of Experiments (SDoE) and makes use of experimental motion captured data. Analysis of Variance (ANOVA) was performed for the determination of the impact factor of the anthropometric parameters influencing the human motion path. Semi-empirical additive models was developed next, based on the results of this analysis, which connects the effect of anthropometrics with the trajectories of the markers that are attached on the human body during the motion capture procedure. The composing of the proposed motion modelling methodology is following. Given that human motion is analysed by a set of sequential motion frames, the modelling methodology aims at the generation of digital human’s postures for each frame of the desirable motion scenario. Motion scenario is each possible combination of “task – computer manikin – environment”. For the creation of a new motion’s frame, the algorithm of the methodology generates alternative postures, ensuring the rejection of non-realistic and constraint-violating postures. The basic concept of the modelling methodology is based on the multi-criteria decision making, which is used for the alternatives’ evaluation and the selection of the best-ranked human postures that constitute the new human motion. The criteria concern both the extensionality of the new motion and the satisfaction of the new constraints, related to the geometric modifications of the working environment. The description of the primary and secondary components of the implemented system, as well as their detailed design are presented next. The developed system consists of the following primary components: i)the data base, which includes reference motions, computer manikins, virtual environments and tasks , ii)the alternative generation mechanism, which takes into account the new constraints, iii)the evaluation criteria of alternatives, which are related to joint angles’ and end-effector’s similarity, iv)the decision matrix, which calculates the evaluation score of each alternative posture, based on the criteria, v)the aggregation mechanism, which calculates the utility score of each alternative, based on the evaluation scores and the weights of the criteria, vi)the ranking mechanism, which sorts the alternatives based on the utility score and selects the best-ranked alternative for each motion frame. The developed system enables the creation of adapted motions for digital humans that satisfies the new conditions and constraints. The new conditions and constraints come from the modification of the anthropometrics of the digital human model that realize the motion and/or the modification of the shape/geometry of the working environment. The evaluation of the proposed methodology’s efficiency is illustrated through a set of experiments through the pilot application coming from the automotive industry. The pilot application aims at the ergonomic evaluation of the interior design of a passenger car, focusing mainly on the position optimization for the driver’s seat and the door’s handle. The evaluation demonstrates the prediction capabilities of the algorithm, when both anthropometrics and environment parameters are modified. The algorithm generates accurate and realistic human motions that can be efficiently used in order to improve the computer-aided ergonomic design of manufacturing products and processes.
- Book Chapter
3
- 10.1007/978-3-030-20131-9_143
- Jan 1, 2019
A standing-up assistance chair without any electric actuator for the weak-muscle people, such as elderly people and patients, was developed in this research. The mechanism was designed based on the real human standing-up motion. To satisfy the motion, the design principles were considered, 1) the timing of opportunity of leaning forward motion (forward movement of shoulder) before lifting the hip joint is 40% of standing-up phase; 2) The trajectory of user’s hip joint is a straight line with angle of 45 [deg]; 3) the device supports the user until his/her knee joint angle achieve around 60 [deg]. The mechanism with eight linkages based on Hart’s exact straight-line mechanism was considered, which can realize the motion of the approximate straight line with an angle of 45 [deg]. Based on the real human motion and design principles, the proper link lengths of the device were simulated to observe the effect of hip joint angle and the trajectory of COG. We confirmed that during the phase of standing leaning forward and recovered motion can be assisted with an adequate method. By using our assistance chair, the torque on each joint can be decreased. For different sizes of users, we intend to build S, M, L types of the device. Based on the average human body data [3].
- Conference Article
5
- 10.5220/0005304303320339
- Jan 1, 2015
Data-driven animation using a large human motion database enables the programing of various natural human motions. While the development of a motion capture system allows the acquisition of realistic human motion, segmenting the captured motion into a series of primitive motions for the construction of a motion database is necessary. Although most segmentation methods have focused on periodic motion, e.g., walking and jogging, segmenting non-periodic and asymmetrical motions such as dance performance, remains a challenging problem. In this paper, we present a specialized segmentation approach for human dance motion. Our approach consists of three steps based on the assumption that human dance motion is composed of consecutive choreographic primitives. First, we perform an investigation based on dancer perception to determine segmentation components. After professional dancers have selected segmentation sequences, we use their selected sequences to define rules for the segmentation of choreographic primitives. Finally, the accuracy of our approach is verified by a user-study, and we thereby show that our approach is superior to existing segmentation methods. Through three steps, we demonstrate automatic dance motion synthesis based on the choreographic primitives obtained.
- Conference Article
2
- 10.1109/urai.2011.6145980
- Nov 1, 2011
Real time human body motion estimation plays an important role in the perception for robotics nowadays, especially for the applications of human robot interaction and service robotics. In this paper, we propose a method for real-time 3D human body motion estimation based on 3-layer laser scans. All the useful scanned points, presenting the human body contour information, are subtracted from the learned background of the environment. For human contour feature extraction, in order to avoid the situations of unsuccessful segmentation, we propose a novel iterative template matching algorithm for clustering, where the templates of torso and hip sections are modeled with different radii. Robust distinct human motion features are extracted using maximum likelihood estimation and nearest neighbor clustering method. Subsequently, the positions of human joints in 3D space are retrieved by associating the extracted features with a pre-defined articulated model of human body. Finally we demonstrate our proposed methods through experiments, which show accurate human body motion tracking in real time.
- Conference Article
1
- 10.1109/icip46576.2022.9897884
- Oct 16, 2022
In this paper, a deep learning-based model for 3D human motion generation from the text is proposed via gesture action classification and an autoregressive model. The model focuses on generating special gestures that express human thinking, such as waving and nodding. To achieve the goal, the proposed method predicts expression from the sentences using a text classification model based on a pretrained language model and generates gestures using the gate recurrent unit-based autoregressive model. Especially, we proposed the loss for the embedding space for restoring raw motions and generating intermediate motions well. Moreover, the novel data augmentation method and stop token are proposed to generate variable length motions. To evaluate the text classification model and 3D human motion generation model, a gesture action classification dataset and action-based gesture dataset are collected. With several experiments, the proposed method successfully generates perceptually natural and realistic 3D human motion from the text. Moreover, we verified the effectiveness of the proposed method using a public-available action recognition dataset to evaluate cross-dataset generalization performance.
- Research Article
2
- 10.1109/tpami.2024.3388042
- Jan 1, 2024
- IEEE transactions on pattern analysis and machine intelligence
Generating realistic 3D human motion has been a fundamental goal of the game/animation industry. This work presents a novel transition generation technique that can bridge the actions of people in the foreground by generating 3D poses and shapes in-between photos, allowing 3D animators/novice users to easily create/edit 3D motions. To achieve this, we propose an adaptive motion network (ADAM-Net) that effectively learns human motion from masked action sequences to generate kinematically compliant 3D poses and shapes in-between given temporally-sparse photos. Three core learning designs underpin ADAM-Net. First, we introduce a random masking process that randomly masks images from an action sequence and fills masked regions in latent space by interpolation of unmasked images to simulate various transitions under given temporally-sparse photos. Second, we propose a long-range adaptive motion (L-ADAM) attention module that leverages visual cues observed from human motion to adaptively recalibrate the range that needs attention in a sequence, along with a multi-head cross-attention. Third, we develop a short-range adaptive motion (S-ADAM) attention module that weightedly selects and integrates adjacent feature representations at different levels to strengthen temporal correlation. By coupling these designs, the results demonstrate that ADAM-Net excels not only in generating 3D poses and shapes in-between photos, but also in classic 3D human pose and shape estimation.
- Research Article
31
- 10.1109/tcsvt.2023.3255186
- Oct 1, 2023
- IEEE Transactions on Circuits and Systems for Video Technology
Human motion prediction intends to predict how humans move given a historical sequence of 3D human motions. Recent transformer-based methods have attracted increasing attentions and demonstrated their promising performance in 3D human motion prediction. However, existing methods generally decompose the input of human motion information into spatial and temporal branches in a separate way and seldom consider their inherent coherence between the two branches, hence often failing to register the dynamic spatio-temporal information during the training process. Motivated by these issues, we propose a spatio-temporal cross-transformer network (STCT) for 3D human motion predictions. Specifically, we investigate various types of interaction methods (i.e., Concatenation Interaction, Msg token interaction, and Cross-transformer) to capture the coherence of the spatial and temporal branches. According to the obtained results, the proposed cross-transformer interaction method shows its superiority over other methods. Meanwhile, considering that most existing works treat the human body as a set of 3D human joint positions, the predicted human joints are proportionally less appropriate to the realistic human body due to unreasonable bone length and non-plausible poses as time progresses. We further resort to the bone constraints of human mesh to produce more realistic human motions. By fitting a parametric body model (i.e., SMPL-X model) to the predicted human joints, a reconstruction loss function is proposed to remedy the unreasonable bone length and pose errors. Comprehensive experiments on AMASS and Human3.6M datasets have demonstrated that our method achieves superior performance over compared methods.
- Research Article
83
- 10.1111/j.1600-0668.2008.00527.x
- Apr 18, 2008
- Indoor Air
An immersed boundary method for particulate flow in an Eulerian framework is utilized to examine the effects of complex human motion on the transport of trace contaminants. The moving human object is rendered as a level set in the computational domain, and realistic human walking motion is implemented using a human kinematics model. A large eddy simulation (LES) technique is used to simulate the fluid and particle dynamics induced by human activity. Parametric studies are conducted within a Room-Room and a Room-Hall configuration, each separated by an open doorway. The effects of the average walking speed, initial proximity from the doorway, and the initial mass loading on room-to-room contaminant transport are examined. The rate of mass transport increases as the walking speed increases, but the total amount of material transported is more influenced by the initial proximity of the human from the doorway. The Room-Hall simulations show that the human wake transports material over a distance of about 8 m. Time-dependent data extracted from the simulations is used to develop a room-averaged zonal model for contaminant transport due to human walking motion. The model shows good agreement with the LES results. The effect of human activity on contaminant transport may be important in applications such as clean or isolation room design for biochemical production lines, in airborne infection control, and in entry/exit into collective protection or decontamination systems. The large eddy simulations (LES) performed in this work allow precise capturing of the local wakes generated by time-dependent human motion and thus provide a means of quantifying contaminant transport due to wake effects. The LES database can be used to develop zonal models for the bulk effects of human-induced contaminant transport. These may be incorporated into multi-zone infiltration models for use in threat-response and exposure mitigation studies.
- Conference Article
105
- 10.1109/cvpr46437.2021.00928
- Jun 1, 2021
Synthesizing 3D human motion plays an important role in many graphics applications as well as understanding human activity. While many efforts have been made on generating realistic and natural human motion, most approaches neglect the importance of modeling human-scene interactions and affordance. On the other hand, affordance reasoning (e.g., standing on the floor or sitting on the chair) has mainly been studied with static human pose and gestures, and it has rarely been addressed with human motion. In this paper, we propose to bridge human motion synthesis and scene affordance reasoning. We present a hierarchical generative framework to synthesize long-term 3D human motion conditioning on the 3D scene structure. Building on this framework, we further enforce multiple geometry constraints between the human mesh and scene point clouds via optimization to improve realistic synthesis. Our experiments show significant improvements over previous approaches on generating natural and physically plausible human motion in a scene. <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup>
- Research Article
2
- 10.3390/electronics14030605
- Feb 4, 2025
- Electronics
Many applications benefit from the prediction of 3D human motion based on past observations, e.g., human–computer interactions, autonomous driving. However, while existing methods based on encoding–decoding achieve good performance, prediction in the range of seconds still suffers from errors and motion switching scarcity. In this paper, we propose a Latent Diffusion and Physical Principles Model (LDPM) to achieve accurate human motion prediction. Our framework performs human motion prediction by learning information about the potential space, noise-generated motion, and combining physical control of body motion, where physics principles estimate the next frame through the Euler–Lagrange equation. The framework effectively accomplishes motion switching and reduces the error accumulated over time. The proposed architecture is evaluated on three challenging datasets: Human3.6M (Human 3D Motion Capture Dataset), HumanEva-I (Human Evaluation dataset I), and AMASS (Archive of Motion Capture as Surface Shapes). We experimentally demonstrate the significant superiority of the proposed framework in the prediction range of seconds.
- Conference Article
79
- 10.1145/1015330.1015343
- Jan 1, 2004
We describe a sparse Bayesian regression method for recovering 3D human body motion directly from silhouettes extracted from monocular video sequences. No detailed body shape model is needed, and realism is ensured by training on real human motion capture data. The tracker estimates 3D body pose by using Relevance Vector Machine regression to combine a learned autoregressive dynamical model with robust shape descriptors extracted automatically from image silhouettes. We studied several different combination methods, the most effective being to learn a nonlinear observation-update correction based on joint regression with respect to the predicted state and the observations. We demonstrate the method on a 54-parameter full body pose model, both quantitatively using motion capture based test sequences, and qualitatively on a test video sequence.
- Conference Article
3
- 10.1109/uemcon.2017.8249025
- Oct 1, 2017
Human kinetic energy is considered to be a promising green energy source to enable human-powered Internet of Things (IoT), as constrained lifetime has become a bottleneck problem for IoT devices. However, the scarce energy collected by human motion severely restricts the operation of human-powered IoT and stresses the need for an optimized inertial harvester to provide more energy from human daily activities. In this paper, we investigate the feasibility and efficiency of using a single frequency inertial energy harvester, which is optimized based on a typical one-day motion of a human subject, to harvest kinetic energy from multiple-day activities of the same human subject. To facilitate this investigation, we propose a novel optimization framework to maximize the harvested power from human daily motion using a single-frequency energy harvester. By analyzing the frequency characteristics of human daily motion and the inertial harvester model, the optimal inertial harvester parameters are determined to maximize power generation from a typical one-day motion, and are used to harvest power from the same human subject's motion of other days. The real world human motion dataset is used for evaluation. The results demonstrate that the propose method can maximize power generated from one-day motion. Furthermore, the optimal harvester parameters determined by one-day trace can also achieve near-optimal harvested power from other days.
- Conference Article
1
- 10.1109/icsmc.2004.1399855
- Oct 10, 2004
This paper proposes a recognition algorithm based on kernel classifier for human daily life action such as walking or lying down. The advantage of the proposed algorithm is to realize implant of qualitative human knowledge and robust recognition accuracy at the same time. The main features of the presented method are: (1)utilizing Gaussian process with latent variables for relation between recognized labels and input human motion, (2) in order to embed prior knowledge for proper recognition of novel motion dissimilar to the learned motion data, assigning probabilistic labels to virtual human motions generated in sparse area of input motion feature space, (3) learning parameters of classifier by real human motion with labels and the virtual motions in Bayesian perspective. The result of cross-validation like experiment shows that the accuracy of the proposed method is as good as support vector classification based recognition methods. It is also shown that the proposed method can recognize some novel motion fit into human common sense even when the classifiers without embedded knowledge fails to recognize it.
- Research Article
39
- 10.1109/jsen.2018.2820644
- May 15, 2018
- IEEE Sensors Journal
Human kinetic energy is regarded as a promising sustainable energy source to solve the energy bottleneck of Internet of Things (IoT). The low power harvested from human motion and scarce hardware resource of IoT severely restrain the operation of kinetic energy harvesting IoT and stress the need for power management strategies to improve the energy efficiency. In this paper, we propose a novel power management framework for kinetic energy harvesting IoT, composed of an off-line inertial harvester optimization algorithm and an on-line joint sink selection and transmission power control module. By analyzing the characteristics of human daily motion and the inertial harvester model, the optimal inertial harvester parameters are determined to maximize the power generation from human daily motion. The on-line scheme improves energy efficiency by joint consideration of optimal sink selection (i.e., on-body sink or off-body sink) and transmission power control. The real world human motion data set is used to evaluate the proposed framework. The simulation results indicate that, compared with the existing approach, the proposed kinetic harvester optimization algorithm achieves 83.31% to 135.69% improvement in harvested power from the same human motion trace. In addition, the proposed on-line joint sink selection and transmission power control incurs 7.07% to 34.23% improvement in transmission energy efficiency.
- Conference Article
7
- 10.1145/3474349.3480219
- Oct 10, 2021
The synthesis of complicated and realistic human motion is a challenging problem and is a significant task for game, film and animation industries. Many existing methods rely on complex and time-consuming keyframe-based methods that demand professional skills in animation software and motion capture hardware. On the other hand, casual users seek a playful experience to animate their favorite characters with a simple and easy-to-use tool. Recent work has explored building intuitive animation systems but suffers from inability to generate complex and expressive motions. To tackle this limitation, we present a keyframe-driven animation synthesis algorithm that is able to produce complex human motions with a few input keyframes, allowing the user to control the keyframes at will. Inspired by the success of attention-based techniques in natural language processing, our method completes body motions in a sequence-to-sequence manner and captures motion dependencies both spatially and temporally. We evaluate our method qualitatively and quantitatively on the LaFAN1 dataset, demonstrating improved accuracy compared with state of the art methods.