Sports videos are widely used by athletes and coaches for training and match analysis purposes outside the mainstream audience. Sports videos should be effectively classified into different genres to easily retrieve and index them from large video datasets. Manual labelling classification methods may cause errors and have low accuracy. Classification based on video content analysis is challenging for computer vision-based techniques. This work introduces an improved focus-net deep learning (DL) model called the Convolutional squeeze U-Net based encoder-decoder for sports video retrieval and classification. First, the keyframes are extracted from the input sports video using a clustering and optical flow analysis method. In the next stage, the frames are pre-processed using a smoothed shock filtering technique to remove the noise. The process of image segmentation is carried out using a Convolutional squeeze U-Net based encoder-decoder model. Finally, the sports video can be classified using the softmax classifier. A CNN (convolutional neural network) is utilized at the encoder section for extracting the features and fed to the decoder for video classification. The experiments are performed in the UCF101 dataset, and the proposed model achieved an overall accuracy of 99.68%. Hence, it is proven that the proposed focus-net model can be efficiently utilized in sports video classification.
Read full abstract