Abstract

We propose a novel strategy for the automatic detection of highlight events in user-generated tricking videos, to the best of our knowledge, the first one specifically tailored for this complex sport. Most current methods for related sports leverage high-level semantics such as predefined camera angles or common editing practices, or rely on depth cameras to achieve automatic detection. However, our approach only relies on the contents (themselves) in the frames of a given video, and consists in a four stage pipeline. The first stage identifies foreground key points of interest along with an estimation of their motion in the video frames. In the second stage, these points are grouped into regions of interest based on their proximity and motion. Their behavior over time is evaluated in the third stage to generate an attention map indicating the regions participating in the most relevant events. The fourth and final stage provides the extracted video sequences where highlights have been identified. Experimental results attest to the effectiveness of our approach, which shows high recall and precision values at frame level, with detections that fit well the ground truth events.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call