Abstract

The analysis of video acquired with a wearable camera is a challenge that multimedia community is facing with the proliferation of such sensors in various applications. In this paper, we focus on the problem of automatic visual place recognition in a weakly constrained environment, targeting the indexing of video streams by topological place recognition. We propose to combine several machine learning approaches in a time regularized framework for image-based place recognition indoors. The framework combines the power of multiple visual cues and integrates the temporal continuity information of video. We extend it with computationally efficient semisupervised method leveraging unlabeled video sequences for an improved indexing performance. The proposed approach was applied on challenging video corpora. Experiments on a public and a real-world video sequence databases show the gain brought by the different stages of the method.

Highlights

  • Due to the recent achievements in the miniaturization of cameras and their embedding in smart devices, a number of video sequences captured using such wearable cameras increased substantially

  • Experiments carried out this far show that unlabeled data is useful for image-based place recognition

  • We demonstrated that a better manifold leveraging unlabeled data can be learned using semi-supervised Laplacian Support Vector Machine (SVM) with the assumption of low density class separation

Read more

Summary

Introduction

Due to the recent achievements in the miniaturization of cameras and their embedding in smart devices, a number of video sequences captured using such wearable cameras increased substantially. Recordings captured using wearable camera depict a view that is inside-out, close to the subjective view of the camera wearer It is a unique source of information, with applications such as a memory refresh aid or as an additional source of information for the analysis of various activities and behavior related events in healthcare context. This often comes at the price of contents with very high variability, rapid camera displacement, and poorly constrained environments in which the person moves. Location is an important contextual information, that restricts the possible number of ongoing activities Obtaining this information directly from the video stream is an interesting application in multimedia processing since no additional equipment such as GPS or RFID is needed. This may be even be a constraint, since the access to such modalities is limited in practice by the available devices and the installation of any invasive equipment in the environment (such as home) may not be welcome

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call