YOLO-Rlepose: Improved YOLO Based on Swin Transformer and Rle-Oks Loss for Multi-Person Pose Estimation

Yi Jiang,Jinlin Zhu,Li Qin,Kexin Yang

doi:10.3390/electronics13030563

Yi Jiang, Jinlin Zhu + Show 2 more

Open Access

https://doi.org/10.3390/electronics13030563

Copy DOI

Journal: Electronics	Publication Date: Jan 30, 2024
Citations: 1	License type: CC BY 4.0

Affiliation: Harbin University of Science and Technology

Abstract

In recent years, there has been significant progress in human pose estimation, fueled by the widespread adoption of deep convolutional neural networks. However, despite these advancements, multi-person 2D pose estimation still remains highly challenging due to factors such as occlusion, noise, and non-rigid body movements. Currently, most multi-person pose estimation approaches handle joint localization and association separately. This study proposes a direct regression-based method to estimate the 2D human pose from a single image. The authors name this network YOLO-Rlepose. Compared to traditional methods, YOLO-Rlepose leverages Transformer models to better capture global dependencies between image feature blocks and preserves sufficient spatial information for keypoint detection through a multi-head self-attention mechanism. To further improve the accuracy of the YOLO-Rlepose model, this paper proposes the following enhancements. Firstly, this study introduces the C3 Module with Swin Transformer (C3STR). This module builds upon the C3 module in You Only Look Once (YOLO) by incorporating a Swin Transformer branch, enhancing the YOLO-Rlepose model’s ability to capture global information and rich contextual information. Next, a novel loss function named Rle-Oks loss is proposed. The loss function facilitates the training process by learning the distributional changes through Residual Log-likelihood Estimation. To assign different weights based on the importance of different keypoints in the human body, this study introduces a weight coefficient into the loss function. The experiments proved the efficiency of the proposed YOLO-Rlepose model. On the COCO dataset, the model outperforms the previous SOTA method by 2.11% in AP.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

YOLO-Rlepose: Improved YOLO Based on Swin Transformer and Rle-Oks Loss for Multi-Person Pose Estimation

Abstract

Talk to us

Similar Papers

More From: Electronics

Lead the way for us

Similar Papers

AnimePose: Multi-person 3D pose estimation and animation
Laxman Kumarapu ... Prerana Mukherjee
Pattern Recognition Letters | VOL. 147
Laxman Kumarapu, et. al.Laxman Kumarapu ... Prerana Mukherjee
10 Apr 2021
Pattern Recognition Letters | VOL. 147

Multi-person 3D Pose Estimation in Crowded Scenes Based on Multi-view Geometry
He Chen ... Pengfei Guo
-
He Chen, et. al.He Chen ... Pengfei Guo
01 Jan 2020
01 Jan 2020

Bidirectional Optimization Coupled Lightweight Networks for Efficient and Robust Multi-Person 2D Pose Estimation
Shuai Li ... Zheng Fang
Journal of Computer Science and Technology | VOL. 34
Shuai Li, et. al.Shuai Li ... Zheng Fang
01 May 2019
Journal of Computer Science and Technology | VOL. 34

Enhanced 3D Pose Estimation in Multi-Person, Multi-View Scenarios through Unsupervised Domain Adaptation with Dropout Discriminator.
Junli Deng ... Ping Shi
Sensors | VOL. 23
Junli Deng, et. al.Junli Deng ... Ping Shi
12 Oct 2023
Sensors | VOL. 23

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

YOLO-Rlepose: Improved YOLO Based on Swin Transformer and Rle-Oks Loss for Multi-Person Pose Estimation

Abstract

Talk to us

Similar Papers

More From: Electronics