A self-supervised spatio-temporal attention network for video-based 3D infant pose estimation

Wang Yin,Linxi Chen,Xinrui Huang,Chunling Huang,Zhaohong Wang,Yang Bian,You Wan,Yuan Zhou,Tongyan Han,Ming Yi

doi:10.1016/j.media.2024.103208

Abstract

General movement and pose assessment of infants is crucial for the early detection of cerebral palsy (CP). Nevertheless, most human pose estimation methods, in 2D or 3D, focus on adults due to the lack of large datasets and pose annotations on infants. To solve these problems, here we present a model known as YOLO-infantPose, which has been fine-tuned, for infant pose estimation in 2D. We further propose a self-supervised model called STAPose3D for 3D infant pose estimation based on videos. We employ multi-view video data during the training process as a strategy to address the challenge posed by the absence of 3D pose annotations. STAPose3D combines temporal convolution, temporal attention, and graph attention to jointly learn spatio-temporal features of infant pose. Our methods are summarized into two stages: applying YOLO-infantPose on input videos, followed by lifting these 2D poses along with respective confidences for every joint to 3D. The employment of the best-performing 2D detector in the first stage significantly improves the precision of 3D pose estimation. We reveal that fine-tuned YOLO-infantPose outperforms other models tested on our clinical dataset as well as two public datasets MINI-RGBD and YouTube-Infant dataset. Results from our infant movement video dataset demonstrate that STAPose3D effectively comprehends the spatio-temporal features among different views and significantly improves the performance of 3D infant pose estimation in videos. Finally, we explore the clinical application of our method for general movement assessment (GMA) in a clinical dataset annotated as normal writhing movements or abnormal monotonic movements according to the GMA standards. We show that the 3D pose estimation results produced by our STAPose3D model significantly boost the GMA prediction performance than 2D pose estimation. Our code is available at github.com/wwYinYin/STAPose3D.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A self-supervised spatio-temporal attention network for video-based 3D infant pose estimation

Abstract

Talk to us

Similar Papers

More From: Medical Image Analysis

Lead the way for us

Similar Papers

Weakly-supervised pre-training for 3D human pose estimation via perspective knowledge
Zhongwei Qiu ... Dongmei Fu
Pattern Recognition | VOL. 139
Zhongwei Qiu, et. al.Zhongwei Qiu ... Dongmei Fu
05 Mar 2023
Pattern Recognition | VOL. 139

A Multi-Task Neural Network for Action Recognition with 3D Key-Points
Rongxiao Tang ... Luyang Wang
-
Rongxiao Tang, et. al.Rongxiao Tang ... Luyang Wang
10 Jan 2021
10 Jan 2021

View Invariant 3D Human Pose Estimation
Guoqiang Wei ... Zhibo Chen
IEEE Transactions on Circuits and Systems for Video Technology | VOL. 30
Guoqiang Wei, et. al.Guoqiang Wei ... Zhibo Chen
22 Jul 2019
IEEE Transactions on Circuits and Systems for Video Technology | VOL. 30

Multi-View Pose Generator Based on Deep Learning for Monocular 3D Human Pose Estimation
Jun Sun ... Dejun Zhang
Symmetry | VOL. 12
Jun Sun, et. al.Jun Sun ... Dejun Zhang
04 Jul 2020
Symmetry | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A self-supervised spatio-temporal attention network for video-based 3D infant pose estimation

Abstract

Talk to us

Similar Papers

More From: Medical Image Analysis