Abstract

Existing video summarisation techniques are quite generic in nature, since they generally overlook the important aspect of what actual purpose the summary will be serving. In sharp contrast with this mainstream work, it can be acknowledged that there are many possible purposes the same videos can be summarised for. Accordingly, we consider a novel perspective: summaries with a purpose. This work is an attempt to both, call the attention on this neglected aspect of video summarisation research, and to illustrate it and explore it with two concrete purposes, focusing on first-person-view videos. The proposed purpose-oriented summarisation techniques are framed under the common (frame-level) scoring and selection paradigm, and have been tested on two egocentric datasets, BEOID and EGTEA-Gaze+. The necessary purpose-specific evaluation metrics are also introduced.The proposed approach is compared with two purpose-agnostic summarisation baselines. On the one hand, a partially agnostic method uses the scores obtained by the proposed approach, but follows a standard generic frame selection technique. On the other hand, the fully agnostic method do not use any purpose-based information, and relies on generic concepts such as diversity and representativeness. The results of the experimental work show that the proposed approaches compare favourably with respect to both baselines. More specifically, the purpose-specific approach generally produces summaries with the best compromise between summary lengths and favourable purpose-specific metrics. Interestingly, it is also observed that results of the partially-agnostic baseline tend to be better than those of the fully-agnostic one. These observations provide strong evidence on the advantage and relevance of purpose-specific summarisation techniques and evaluation metrics, and encourage further work on this important subject.

Highlights

  • Video summarisation has been investigated for both structured third-person point of view (Money & Agius, 2008) and for unstructured, first-person perspective (del Molino, Tan, Lim, & Tan, 2017)

  • Since we are interested in evaluating how the purpose specificity helps in generating purpose-specific summaries, we propose baselines which are partly similar to the purpose-specific summarisers, but lack the purpose-related information at some point, effectively turning them into more purpose agnostic

  • For a similar ratio (Cf=0.20), KTS(p∗) gets SC = 0.76, significantly higher than SC = 0.48 with Deep Semantic Features (DSF). These relative performances are consistent with how purpose-aware each of these baselines are: KTS is partially purpose-aware since it shares the scores produced by general-interest purposeoriented summarisation (GiPo) but uses a purpose-agnostic selecting algorithm, whereas DSF is totally purpose-agnostic

Read more

Summary

Introduction

Video summarisation has been investigated for both structured third-person point of view (Money & Agius, 2008) and for unstructured, first-person (egocentric) perspective (del Molino, Tan, Lim, & Tan, 2017). A range of approaches has been explored, from supervised methods that learn from available ground-truth summaries produced by human subjects (Zhao, Li, & Lu, 2018) to unsupervised ones which rely on heuristics such as diversity, sparsity or representativeness (Mahasseni, Lam, & Todorovic, 2017; Zhou, Qiao, & Xiang, 2018a). Innovative proposals address the difficulty of having paired video-summary by learning from unpaired sets (Rochan & Wang, 2019), and the elusive but critical problem of summarisation evaluation is revisited (Abdalla, Menezes, & Oliveira, 2019; Kaushal, Kothawade, Tomar, Iyer, & Ramakrishnan, 2021; Otani, Nakashima, Rahtu, & Heikkila, 2019).

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call