Abstract

Automatic event recognition in sports photos is both an interesting and valuable research topic in the field of computer vision and deep learning. With the rapid increase and the explosive spread of data, which is being captured momentarily, the need for fast and precise access to the right information has become a challenging task with considerable importance for multiple practical applications, i.e., sports image and video search, sport data analysis, healthcare monitoring applications, monitoring and surveillance systems for indoor and outdoor activities, and video captioning. In this paper, we evaluate different deep learning models in recognizing and interpreting the sport events in the Olympic Games. To this end, we collect a dataset dubbed Olympic Games Event Image Dataset (OGED) including 10 different sport events scheduled for the Olympic Games Tokyo 2020. Then, the transfer learning is applied on three popular deep convolutional neural network architectures, namely, AlexNet, VGG-16 and ResNet-50 along with various data augmentation methods. Extensive experiments show that ResNet-50 with the proposed photobombing guided data augmentation achieves 90% in terms of accuracy.

Highlights

  • Sports are a major section of media, accounting for a massive portion of TV broadcasting, and they have become a dominant focus in the field of entertainment, thanks to the massive commercial appeal of sports programs [1]

  • The transfer learning is applied on three popular deep convolutional neural network architectures, namely, AlexNet, VGG-16 and ResNet-50 along with various data augmentation methods

  • With this rapid increase and the explosive spread of sport data, the need for fast and accurate access to the right information has become a challenging task with considerable importance for multiple practical applications

Read more

Summary

Introduction

Sports are a major section of media, accounting for a massive portion of TV broadcasting, and they have become a dominant focus in the field of entertainment, thanks to the massive commercial appeal of sports programs [1]. Scientist and researchers have developed various deep learning models and methods to efficiently classify huge datasets based on different input types including images and videos. C1–C5, form the architecture of AlexNet, followed by three fully connected layers FC6–FC8 This Deep Convolutional Neural Networks used in [8] is one of the networks that we use in our work where fine-tuning method is used for optimizing the recognition task result. 80% of the OGED images were used for training the networks and 20% were used for testing The images in this dataset were collected and randomly captured from YouTube videos of real Olympic Games featuring different events, ranging from Atlanta 1996 to the most recent in Rio 2016 due to the availability of video data. Details about the data augmentation techniques are presented in the subsection

Photobombing Guided Data Augmentation
Performance of Pre-Trained Models with KNN
Conclusions and Future Work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call