An enhanced Swin Transformer for soccer player reidentification

Sara Akan,Songül Varlı,Mohammad Alfrad Nobel Bhuiyan

doi:10.1038/s41598-024-51767-4

Abstract

The re-identification (ReID) of objects in images is a widely studied topic in computer vision, with significant relevance to various applications. The ReID of players in broadcast videos of team sports is the focus of this study. We specifically focus on identifying the same player in images taken at any given moment during a game from various camera angles. This work varies from other person ReID apps since the same team wears very similar clothes, there are few samples for each identification, and image resolutions are low. One of the hardest parts of object ReID is robust feature representation extraction. Despite the great success of current convolutional neural network-based (CNN) methods, most studies only consider learning representations from images, neglecting long-range dependency. Transformer-based model studies are increasing and yielding encouraging results. Transformers still have trouble extracting features from small objects and visual cues. To address these issues, we enhanced the Swin Transformer with the levering of CNNs. We created a regional feature extraction Swin Transformer (RFES) backbone to increase local feature extraction and small-scale object feature extraction. We also use three loss functions to handle imbalanced data and highlight challenging situations. Re-ranking with k-reciprocal encoding was used in this study's retrieval phase, and its assessment findings were provided. Finally, we conducted experiments on the Market-1501 and SoccerNet-v3 ReID datasets. Experimental results show that the proposed re-ID method reaches rank-1 accuracy of 96.2% with mAP: 89.1 and rank-1 accuracy of 84.1% with mAP: 86.7 on the Market-1501 and SoccerNet-v3 datasets, respectively, outperforming the state-of-the-art approaches.

Full Text