Multi-Sensor Integration for Key-Frame Extraction From First-Person Videos

Yujie Li,Taiki Miyanishi,Motoaki Kawanabe,Atsunori Kanemura,Hideki Asoh

doi:10.1109/access.2020.3007150

Abstract

Key-frame extraction for first-person vision (FPV) videos is a core technology for selecting important scenes and memorizing impressive life experiences in our daily activities. The difficulty of selecting key frames is the scene instability caused by head-mounted cameras used for capturing FPV videos. Because head-mounted cameras tend to frequently shake, the frames in an FPV video are noisier than those in a third-person vision (TPV) video. However, most existing algorithms for key-frame extraction mainly focus on handling the stable scenes in TPV videos. The technical development of key-frame extraction techniques for noisy FPV videos is currently immature. Moreover, most key-frame extraction algorithms mainly use visual information from FPV videos, even though our visual experience in daily activities is associated with human motions. To incorporate the features of dynamically changing scenes in FPV videos into our methods, integrating motions with visual scenes is essential. In this paper, we propose a novel key-frame extraction method for FPV videos that uses multi-modal sensor signals to reduce noise and detect salient activities via projecting multi-modal sensor signals onto a common space by canonical correlation analysis (CCA). We show that the two proposed multi-sensor integration models for key-frame extraction (a sparse-based model and a graph-based model) work well on the common space. The experimental results obtained using various datasets suggest that the proposed key-frame extraction techniques improve the precision of extraction and the coverage of entire video sequences.

Highlights

First-person vision (FPV) videos captured by head-mounted wearable cameras are useful for understanding daily life activities [1], [2]
This unconstrained FPV video often contains insignificant objects, such as a ceiling or a floor. (iii) Content: third-person view (TPV) videos record experiences worth remembering through a manual operation that focuses on specific interesting scenes
We show that the proposed multi-sensor integration is effective for key-frame extraction from FPV videos under both sparse-based and graph-based models

Summary

INTRODUCTION

First-person vision (FPV) videos captured by head-mounted wearable cameras are useful for understanding daily life activities [1], [2]. We present a key-frame extraction method for FPV videos with multi-sensor signals. Y. Li et al.: Multi-Sensor Integration for Key-Frame Extraction From First-Person Videos sensor information beyond video frames, while most existing methods use only video information [3]–[19]. We assume that motion information expresses the detailed hand or head movement that visual information does not capture To associate their features, we embed multi-sensor data into a common vector space [20]–[27] using probabilistic canonical correlation analysis (PCCA) [28]. We show that the proposed multi-sensor integration is effective for key-frame extraction from FPV videos under both sparse-based and graph-based models. The proposed multi-sensor integration can improve the key-frame extraction performance across different methods. The proposed multi-sensor integration can improve the key-frame extraction performance across different methods. 2) We expand the experimental results by adding more videos to the dataset used in the conference papers and by introducing another new dataset and quantitative comparisons with the existing methods

RELATED WORKS

PROJECTION WITH MULTI-SENSOR INTEGRATION

FACTOR-GRAPH-BASED KEY-FRAME EXTRACTION

EXPERIMENTAL SETTINGS

METRICS

SPARSE MODEL

CONCLUSION

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE access : practical innovations, open solutions	Publication Date: Jan 1, 2020
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Multi-Sensor Integration for Key-Frame Extraction From First-Person Videos

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions

Lead the way for us

Similar Papers

Key frame extraction from first-person video with multi-sensor integration
Yujie Li ... Atsunori Kanemura
-
Yujie Li, et. al.Yujie Li ... Atsunori Kanemura
01 Jul 2017
01 Jul 2017

Extracting key frames from first-person videos in the common space of multiple sensors
Yujie Li ... Atsunori Kanemura
-
Yujie Li, et. al.Yujie Li ... Atsunori Kanemura
01 Sep 2017
01 Sep 2017

Row Echelon based Spectral Clustering Framework for Key Frame Extraction
Jeyapandi Marimuthu ... Vanniappan Balamurugan
-
Jeyapandi Marimuthu, et. al.Jeyapandi Marimuthu ... Vanniappan Balamurugan
06 May 2021
06 May 2021

Video key frame extraction by unsupervised clustering and feedback adjustment
Yueting Zhuang ... Yong Rui
Journal of Computer Science and Technology | VOL. 14
Yueting Zhuang, et. al.Yueting Zhuang ... Yong Rui
01 May 1999
Journal of Computer Science and Technology | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multi-Sensor Integration for Key-Frame Extraction From First-Person Videos

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions