Abstract

Thanks to the availability and increasing popularity of wearable devices such as GoPro cameras, smart phones, and glasses, we have access to a plethora of videos captured from first person perspective. Surveillance cameras and Unmanned Aerial Vehicles (UAVs) also offer tremendous amounts of video data recorded from top and oblique view points. Egocentric and surveillance vision have been studied extensively but separately in the computer vision community. The relationship between these two domains, however, remains unexplored. In this study, we make the first attempt in this direction by addressing two basic yet challenging questions. First, having a set of egocentric videos and a top-view surveillance video, does the top-view video contain all or some of the egocentric viewers? In other words, have these videos been shot in the same environment at the same time? Second, if so, can we identify the egocentric viewers in the top-view video? These problems can become extremely challenging when videos are not temporally aligned. Each view, egocentric or top, is modeled by a graph and the assignment and time-delays are computed iteratively using the spectral graph matching framework. We evaluate our method in terms of ranking and assigning egocentric viewers to identities present in the top-view video over a dataset of 50 top-view and 188 egocentric videos captured under different conditions. We also evaluate the capability of our proposed approaches in terms of temporal alignment. The experiments and results demonstrate the capability of the proposed approaches in terms of jointly addressing the temporal alignment and assignment tasks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call