Action Spotting and Temporal Attention Analysis in Soccer Videos

Hiroaki Minoura,Yeongnam Chae,Takayoshi Yamashita,Bjorn Stenger,Mitsuru Nakazawa,Hironobu Fujiyoshi,Tsubasa Hirakawa

doi:10.23919/mva51890.2021.9511342

Abstract

Action spotting is the task of finding a specific action in a video. In this paper, we consider the task of spotting actions in soccer videos, e.g., goals, player substitutions, and card scenes, which are temporally sparse within a complete game. We spot actions using a Transformer model, which allows capturing important features before and after action scenes. Moreover, we analyze which time instances the model focuses on when predicting an action by observing the internal weights of the transformer. Quantitative results on the public SoccerNet dataset show that the proposed method achieves an mAP of 81.6%, a significant improvement over previous methods. In addition, by analyzing the attention weights, we discover that the model focuses on different temporal neighborhoods for different actions.

Full Text