Abstract
Wearable cameras allow to easily acquire long and unstructured egocentric videos. In this context, temporal video segmentation methods can be useful to improve indexing, retrieval and summarization of such content. While past research investigated methods for temporal segmentation of egocentric videos according to different criteria (e.g., motion, location or appearance), many of them do not explicitly enforce any form of temporal coherence. Moreover, evaluations have been generally performed using frame-based measures, which only account for the overall correctness of predicted frames, overlooking the structure of the produced segmentation. In this paper, we investigate how a Hidden Markov Model based on an ad-hoc transition matrix can be exploited to obtain a more accurate segmentation from frame-based predictions in the context of location-based segmentation of egocentric videos. We introduce a segment-based evaluation measure which strongly penalizes over-segmented and under-segmented results. Experiments show that the exploitation of a Hidden Markov Model for temporal smoothing greatly improves temporal segmentation results and outperforms current video segmentation methods designed for both third-person and first-person videos.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have