Abstract

As a new rising technology, human motion prediction has broad application prospects in the field of consumer electronics. Since different scale features have different receptive fields in neural networks, researchers began to extract features at different scales to improve prediction accuracy in many recent works. However, human motion refers to the dimensions of both temporal and spatial, and most current approaches tend to perform multiscale feature extraction in only one dimension without fully considering its spatio-temporal dependence. Different with these works, this paper proposes a novel multiscale spatio-temporal parallel network named MSTP-Net, which attempts to improve human prediction by considering multiscale features in both the temporal and spatial dimensions. The network adopts a two-branch parallel structure to model human motion postures in both temporal and spatial dimensions. Each branch of the network specifically consists of corresponding multi-scale modules and the transformer structures. As the entire network model uses a parallel spatio-temporal multiscale strategy, the spatio-temporal features can be fully extracted. And the parameter amount of the proposed network is also reduced compare with the base line. So, the proposed MSTP-Net is a lightweight network, which could improve the prediction accuracy and extend the effective prediction time. The effectiveness of this method is demonstrated by comparing it with the SOTA methods on the Human3.6M dataset.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call