LMFormer: Lightweight and multi-feature perspective via transformer for human pose estimation

Biao Li,Shoufeng Tang,Wenyi Li

doi:10.1016/j.neucom.2024.127884

Abstract

The effectiveness of Token Mixer in visual tasks is well-established; however, its high computational complexity and a relatively singular spatial relationship modeling perspective present challenges. In this study, we propose LMFormer, a hybrid model based on CNN and Transformer architectures for human pose estimation. To achieve this, we first design a lightweight multi-feature perspective Token Mixer, using a lightweight feature reconstruction strategy to simultaneously aggregate the spatial and channel feature information, thereby enhancing the performance and generalization capabilities of the model. Subsequently, we explore multi-scale information interaction by developing an iterative multi-feature weighting module, coupled with the design of a multi-scale information propagation mechanism incorporated into the skip connections. Finally, we validate the effectiveness of the network on benchmark datasets, including COCO, MPII, and CrowdPose, utilizing a multi-scale deep supervision strategy. Extensive experiments demonstrate that LMFormer, with reduced computational complexity, comprehensively captures multi-scale features, resulting in significant performance improvements. Specifically, LMFormer-B achieves an AP score of 65.8 on the COCO val dataset, surpassing MobileNetV2 and ShuffleNetV2 by 1.0 and 5.6 points, respectively. Moreover, its parameters are merely 19.8% and 25% of MobileNetV2 and ShuffleNetV2, with corresponding GFLOPs at 43.8% and 50%. We aim to provide new insights into lightweight and efficient feature extraction strategies, as well as efficient Token Mixer designs.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

LMFormer: Lightweight and multi-feature perspective via transformer for human pose estimation

Abstract

Talk to us

Similar Papers

More From: Neurocomputing

Lead the way for us

Similar Papers

Dual-path multi-branch feature residual network for salient object detection
Zhensen Chen ... Jieyun Bai
Engineering Applications of Artificial Intelligence | VOL. 133
Zhensen Chen, et. al.Zhensen Chen ... Jieyun Bai
11 May 2024
Engineering Applications of Artificial Intelligence | VOL. 133

SRNSSI: A Deep Light-Weight Network for Single Image Super Resolution Using Spatial and Spectral Information
Alireza Esmaeilzehi ... M.N.S Swamy
IEEE Transactions on Computational Imaging | VOL. 7
Alireza Esmaeilzehi, et. al.Alireza Esmaeilzehi ... M.N.S Swamy
01 Jan 2020
IEEE Transactions on Computational Imaging | VOL. 7

SRNMSM: A Deep Light-Weight Image Super Resolution Network Using Multi-Scale Spatial and Morphological Feature Generating Residual Blocks
Alireza Esmaeilzehi ... M. Omair Ahmad
IEEE Transactions on Broadcasting | VOL. 68
Alireza Esmaeilzehi, et. al.Alireza Esmaeilzehi ... M. Omair Ahmad
01 Mar 2022
IEEE Transactions on Broadcasting | VOL. 68

Multiple Uses of Global and Local Features for Person Re-identification
Dawei Niu ... Meibin Qi
-
Dawei Niu, et. al.Dawei Niu ... Meibin Qi
28 May 2020
28 May 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

LMFormer: Lightweight and multi-feature perspective via transformer for human pose estimation

Abstract

Talk to us

Similar Papers

More From: Neurocomputing