Multi-order spatial interaction network for human pose estimation

Dong Wang,Wenjun Xie,Youcheng Cai,Xinjie Li,Xiaoping Liu

doi:10.1016/j.dsp.2023.104219

Dong Wang, Wenjun Xie + Show 3 more

https://doi.org/10.1016/j.dsp.2023.104219

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Recent vision Transformer has been applied to human pose estimation and has achieved excellent performance by two-order spatial interaction with self-attention. However, it is still unclear whether higher-order spatial interaction can facilitate pose estimation. In this paper, we propose a novel approach based on multi-order spatial interactions and confirm that the combination of different orders is beneficial for human pose estimation task. We first build a Triple Interaction Module (TIM) by pure convolutions to make spatial information interactions three times. In contrast to Transformer, the TIM is compatible with several pure convolutions and extends two-order interaction in Transformer to triple-order without extensive additional computation, which makes it easier to explore inter-related features between keypoints in the human body. In addition, we combine TIM with traditional CNN and Transformer to form Multi-order Spatial Interaction Network (MSIN). This paper takes advantage of MSIN to extract keypoint heatmaps and certifies that the order-by-order structure can enhance the overall performance of locating human keypoints. Experimental results demonstrate that MSIN performs favorably against the most state-of-the-art CNN-based and Transformer-based counterparts on the COCO and MPII datasets, while being more lightweight.

Full Text