Abstract

A new method for the detection of important scenes in baseball videos via a time-lag-aware multimodal variational autoencoder (Tl-MVAE) is presented in this paper. Tl-MVAE estimates latent features calculated from tweet, video, and audio features extracted from tweets and videos. Then, important scenes are detected by estimating the probability of the scene being important from estimated latent features. It should be noted that there exist time-lags between tweets posted by users and videos. To consider the time-lags between tweet features and other features calculated from corresponding multiple previous events, the feature transformation based on feature correlation considering such time-lags is newly introduced to the encoder in MVAE in the proposed method. This is the biggest contribution of the Tl-MVAE. Experimental results obtained from actual baseball videos and their corresponding tweets show the effectiveness of the proposed method.

Highlights

  • We propose a new method for the detection of important scenes in baseball videos via the multimodal variational autoencoder (MVAE) considering time-lags between tweets and corresponding multiple previous events

  • Since feature transformation based on feature correlations considering such time-lags is introduced to the time-lagaware multimodal variational autoencoder (Tl-MVAE), the proposed method can derive latent features that are efficient for the consideration of the relationships between tweets and videos

  • Our biggest contribution is the development of a method for the detection of important scenes in baseball videos via the Tl-MVAE, which can consider time-lags between tweets and their corresponding multiple previous events

Read more

Summary

Introduction

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. We propose a new method for the detection of important scenes in baseball videos via the MVAE considering time-lags between tweets and corresponding multiple previous events. Since feature transformation based on feature correlations considering such time-lags is introduced to the Tl-MVAE, the proposed method can derive latent features that are efficient for the consideration of the relationships between tweets and videos. From this novelty, the proposed method can realize accurate detection based on the Tl-MVAE using multimodal features extracted from tweets and videos. We newly introduce the novel Tl-MVAE to the detection of important scenes

Detection of Important Scenes via the TL-MVAE
Encoder
Decoder
Important Scene Detector
Final Loss
Experimental Setting
Performance Evaluation
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call