Abstract
Many applications, such as autonomous navigation, urban planning, and asset monitoring, rely on the availability of accurate information about objects and their geolocations. In this paper, we propose the automatic detection and computation of the coordinates of recurring stationary objects of interest using street view imagery. Our processing pipeline relies on two fully convolutional neural networks: the first segments objects in the images, while the second estimates their distance from the camera. To geolocate all the detected objects coherently we propose a novel custom Markov random field model to estimate the objects’ geolocation. The novelty of the resulting pipeline is the combined use of monocular depth estimation and triangulation to enable automatic mapping of complex scenes with the simultaneous presence of multiple, visually similar objects of interest. We validate experimentally the effectiveness of our approach on two object classes: traffic lights and telegraph poles. The experiments report high object recall rates and position precision of approximately 2 m, which is approaching the precision of single-frequency GPS receivers.
Highlights
The rapid development of computer vision and machine learning techniques in recent decades has excited the ever-growing interest in automatic analysis of huge image datasets accumulated by companies and individual users all worldwide
Image databases with Global Positioning System (GPS) information, such as Google Street View (GSV) and images posted on social networks like Twitter, are regularly updated, provide dense coverage of the majority of populated areas, and can be queried seamlessly using APIs
We test our pipeline on GSV imagery for detection of two object types: traffic lights and telegraph poles
Summary
The rapid development of computer vision and machine learning techniques in recent decades has excited the ever-growing interest in automatic analysis of huge image datasets accumulated by companies and individual users all worldwide. Image databases with Global Positioning System (GPS) information, such as Google Street View (GSV) and images posted on social networks like Twitter, are regularly updated, provide dense coverage of the majority of populated areas, and can be queried seamlessly using APIs. In particular, 360◦ time-stamped geolocated panoramic images captured by cameras mounted on vehicles or carried by pedestrians are publicly accessible from GSV, Bing Streetside, Mapillary, OpenStreetCam, etc. Tens of billions of street view panoramas covering millions of kilometers of roads and depicting street scenes at regular intervals are available [1,2]. This incredible amount of image data allows one to address a multitude of mapping problems by exploring areas remotely, dramatically reducing the costs of in situ inventory, mapping, and monitoring campaigns
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.