Abstract

We present a new approach for matching urban object instances across multiple ground-level images for the ultimate goal of city-scale mapping of objects with high positioning accuracy. What makes this task challenging is the strong change in view-point, different lighting conditions, high similarity of neighboring objects, and variability in scale. We propose to turn object instance matching into a learning task, where image-appearance and geometric relationships between views fruitfully interact. Our approach constructs a Siamese convolutional neural network that learns to match two views of the same object given many candidate image cut-outs. In addition to image features, we propose utilizing location information about the camera and the object to support image evidence via soft geometric constraints. Our method is compared to existing patch matching methods to prove its edge over state-of-the-art. This takes us one step closer to the ultimate goal of city-wide object mapping from street-level imagery to benefit city administration.

Highlights

  • Automated methods for mobile mapping to generate inventories of urban objects at large scale have received significant attention lately [1,2,3,4,5]

  • We propose to augment image evidence with soft geometric constraints to learn object instance matching in street-level images at large scale end-to-end

  • Google street-view and Mapillary provide access to a huge amount of street-level images that can be used to construct very large datasets for deep learning approaches. We use the former to build a multi-view dataset of street-trees and use a dataset provided by the latter for traffic signs. Both are employed as testbeds to learn instance matching with soft geometric constraints based on a Siamese CNN model

Read more

Summary

Introduction

Automated methods for mobile mapping to generate inventories of urban objects at large scale have received significant attention lately [1,2,3,4,5]. Google street-view and Mapillary provide access to a huge amount of street-level images that can be used to construct very large datasets for deep learning approaches We use the former to build a multi-view dataset of street-trees and use a dataset provided by the latter for traffic signs. Both are employed as testbeds to learn instance matching with soft geometric constraints based on a Siamese CNN model. Our main contribution is a modified Siamese CNN architecture that jointly learns geometric constellations from multi-view acquisitions jointly with the appearance information in the images This will further on help us in our main pipeline to better geo-position objects in the wild, and to subsequently assign them with predefined semantic classes. We highlight some examples in the literature per field and draw comparisons between these problems and ours

Related Work
Instance Matching with Soft Geometric Constraints
Model Architecture
Loss Functions
Experiments
Datasets
Evaluation Strategy
Does Geometry Help?
Results
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call