Abstract
There is a growing interest in adopting 3D human pose estimation in safety-critical systems, from healthcare to Industry 5.0. Nevertheless, when applied in such settings, these neural networks may suffer from estimation inaccuracy. Besides imprecise or inconsistent annotations in the training dataset, the inaccuracy is caused by poor image quality, rare poses, dropped frames, or heavy occlusions in the scene. In addition, these scenarios often require the software results to have temporal constraints, such as real-time and zero- or low-latency, which make many of the filtering solutions proposed in the literature inapplicable. This paper proposes FLK, a Filter with Learned Kinematics, to refine 3D human motion data in real-time and at zero/low latency. The temporal core combines a Kalman filter and a low-pass filter, which learns the motion model through a recurrent neural network. The spatial core takes advantage of the biomechanical constraints of the human body to provide spatial coherency between keypoints. The combination of the cores allows the filter to adequately address different types of noise, from jittering to dropped frames. We test the filter on motion data from multiple datasets and seven 3D human pose estimation backbones, improving accuracy up to 140 mm with non-Gaussian noise and 53 mm with missing information.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.