Abstract

In this work, we create two new video datasets for the task of temporal Cricket stroke extraction. The two datasets, namely, the Highlights dataset (with approx. 117K frames) and the Generic dataset (with approx. 1.93M frames), comprise of Cricket telecast videos collected from available online sources and down-sampled to 360×640 at 25FPS. These untrimmed videos have been manually annotated with temporal Cricket strokes considering viewpoint invariance assumption.We construct two learning based localization pipelines which are dependent (Constrained) and independent (Unconstrained) of our viewpoint labeling assumption. The Unconstrained pipeline finetunes a pretrained C3D model with GRU training in disconnected and connected modes, while our Constrained pipeline uses boundary detection with first frame classification for generating the temporal localizations. Two post-processing steps, of filtering and boundary correction, are also discussed which help in improving the overall accuracy values.A modified evaluation metric, Weighted Mean TIoU, for single category temporal localization problem is also presented and compared with the evaluations of the standard mAP metric (threshold ≥0.5) on the created dataset. The best weighted mean TIoU of our method was 0.9376 and 0.7145 on the Highlights and Generic test partitions, respectively.Moreover, we compare our baseline method with 3D Segment CNNs and Temporal Recurrent Networks (TRNs) which have state of art results on THUMOS 2014 dataset.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call