Abstract

In this paper, we novelly consider visual localization in active and passive two ways, with simple definition that active localization assists device to estimate location of its interest while passive localization aids device to estimate its own location in environment. Expecting to indicate some insights into visual localization, we specifically performed two explorations on active localization and more importantly explored to upgrade them from active to passive localization with extra geometry information available. In order to produce unconstrained and accurate 2D location estimation of interested object, we constructed an active localization system by fusing detection, tracking and recognition. Based on recognition, we proposed a collaborative strategy making mutual enhancement between detection and tracking possible to obtain better performance on 2D location estimation. Meanwhile, to actively estimate semantic location of interested visual region, we employed latest state-of-the-art light weight CNN models specifically designed for efficiency and trained two of them with large place dataset in perspective of scene recognition. What’s more, using depth information available from RGB-D camera, we improved the active system for 2D location of interested object to a passive system for relative 3D location of device to the interested object. Firstly estimated was the 3D location of the interested object in the coordinate system of device, then relative location of device to the interested object in world coordinate system was deduced with appropriate assumption. Evaluations both subjectively on a RGB-D sequence obtained in a lab environment and practically on a robotic platform in an office environment indicated that the improved system was suitable for autonomous following robot. As well, the active system for rough semantic location estimation of interested visual region was promoted to a passive system for fine location estimation of device, with available 3D map describing the visited environment. In perspective of place recognition, we first adopted one of the efficient CNN models previously trained for semantic location estimation as a base to generate CNN features for both retrieval of candidate loops in the map and geometrical consistency checking of retrieved loops, then true loops were used to deduce fine location of device itself in environment. Comparison with state-of-the-art results reflected that the promoted system was adequate for long-term robotic autonomy. Achieving favorable performances, the presented four explorations have implied adequacy for elaborating on some insights into visual localization.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call