Abstract

This paper presents a novel method for real-time 3D object detection and tracking in monocular images. The method build maps of a user-specified object from a video sequence, and stores the data for 3D object detection and tracking. The main advantage of the method lies in that it does not need existing 3D models of the objects. Instead, it first detects the target object using the state-of-the-art deep learning-based object detection method, and constructs its map using visual Simultaneous Localization and Mapping (vSLAM). The maps only need to be built once and multiple maps of different objects can be stored. A fast method is proposed to recognize the object in the map with the aid of deep learning-based detection. The method needs only one camera and is robust in cluttered environment. The mode of multiple maps allows the reuse of pre-reconstructed maps. Experimental results show that accurate, fast and robust detection and tracking are achieved.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call