Abstract

The emergence of new wearable technologies, such as action cameras and smart glasses, has driven the use of the first-person perspective in computer applications. This field is now attracting the attention and investment of researchers aiming to develop methods to process first-person vision (FPV) video. The current approaches present particular combinations of different image features and quantitative methods to accomplish specific objectives, such as object detection, activity recognition, user–machine interaction, etc. FPV-based navigation is necessary in some special areas, where Global Position System (GPS) or other radio-wave strength methods are blocked, and is especially helpful for visually impaired people. In this paper, we propose a hybrid structure with a convolutional neural network (CNN) and local image features to achieve FPV pedestrian navigation. A novel end-to-end trainable global pooling operator, called AlphaMEX, has been designed to improve the scene classification accuracy of CNNs. A scale-invariant feature transform (SIFT)-based tracking algorithm is employed for movement estimation and trajectory tracking of the person through each frame of FPV images. Experimental results demonstrate the effectiveness of the proposed method. The top-1 error rate of the proposed AlphaMEX-ResNet outperforms the original ResNet (k = 12) by 1.7% on the ImageNet dataset. The CNN-SIFT hybrid pedestrian navigation system reaches 0.57 m average absolute error, which is an adequate accuracy for pedestrian navigation. Both positions and movements can be well estimated by the proposed pedestrian navigation algorithm with a single wearable camera.

Highlights

  • Portable cameras, able to record dynamic high quality first person video, have become common equipment among sportsmen, policemen, etc., over the last ten years

  • The proposed First-person vision (FPV)‐based pedestrian navigation system is generally useful, and necessary in areas where Global Position System (GPS) or other radio‐wave strength methods are blocked; this is especially true for visually impaired people, who require a localization method that can work in any urban area, ranging from underground to narrow open‐sky pavements enclosed by tall buildings

  • The proposed FPV-based pedestrian navigation system is generally useful, and necessary in areas where GPS or other radio-wave strength methods are blocked; this is especially true for visually impaired people, who require a localization method that can work in any urban area, ranging from underground to narrow open-sky pavements enclosed by tall buildings

Read more

Summary

Introduction

Able to record dynamic high quality first person video, have become common equipment among sportsmen, policemen, etc., over the last ten years. These devices represent commercial attempts to record experiences from a first-person perspective. This technological trend is a follow-up of the academic results obtained in the late 1990s [1], with the growing interest of people to record their daily activities. First-person vision (FPV) [2] is arguably the most commonly used, but other names, such as egocentric vision and ego-vision, have recently grown in popularity. There are many objectives of FPV analysis, such as object recognition and tracking [3,4,5,6], activity recognition [7,8,9,10,11], and environment mapping [12,13,14,15]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call