Abstract

Advanced human sensing technologies based on radio frequency (RF) signals have gained widespread attention in recent years. However, due to the sparsity and incompleteness of RF signals, fine-grained RF-based multi-person 3D pose estimation has progressed more slowly. In this paper, we present RF-based Pose Machine (RPM 2.0) for multi-person 3D pose estimation using RF signals. Specifically, we first develop a lightweight anchor-free detector module to locate and crop regions of interest from horizontal and vertical RF signals. Afterward, we treat the horizontal and vertical millimeter-wave radars as “RF cameras” with different viewing angles and propose a Multi-view Fusion Network to unproject the RF signals into a unified latent feature space, and then calculate the correlation for weighted fusion. Finally, a Spatio-Temporal Attention Network is designed to reconstruct the multi-person 3D skeleton sequences, in which the spatial attention module is proposed to recover invisible body parts using non-local correlations among joints and the temporal attention module refines the 3D pose sequences using temporal coherency learned from frame queries. We evaluate the performance of the proposed RPM 2.0 and state-of-the-art methods on a large-scale dataset with multi-person 3D pose labels and corresponding radar signals. The experimental results show that RPM 2.0 outperforms all of the baseline methods, which locates multi-person 3D key points with an average error of 73 <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">mm</i> and generalizes well in new data such as occlusion, low illumination.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call