Abstract

The authors present a robust and extendable localization system for monocular images. To have both robustness toward noise factors and extendibility to unfamiliar scenes simultaneously, our system combines traditional content-based image retrieval structure with CNN feature extraction model to localize monocular images. The core model of the system is a deep CNN feature extraction model. The feature extraction model can map an image to a d-dimension space where image pairs in the real word have smaller Euclidean distances. The feature extraction model is achieved using a deep Convnet modified from GoogLeNet. A special way to train the feature extraction model is proposed in the article using localization results from Cambridge Landmarks dataset. Through experiments, it is shown that the system is robust to noise factors supported by high level CNN features. Furthermore, the authors show that the system has a powerful extendibility to other unfamiliar scenes supported by a feature extract model's generic property and structure.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.