Outlier Detection in OpenStreetMap Data using the Random Forest Algorithm

Richard Wen,Claus Rinner

doi:10.21433/b3114hp830d6

Abstract

GIScience 2016 Short Paper Proceedings Outlier Detection in OpenStreetMap Data using the Random Forest Algorithm Richard Wen, Claus Rinner Department of Geography and Environmental Studies, Ryerson University 350 Victoria St., Toronto, Ontario, M5B 2K3, Canada Email: {rwen, crinner}@ryerson.ca Abstract OpenStreetMap (OSM) data consist of digitized geographic objects with semantic tags assigned by the volunteer contributors. The tags describe the geographic objects in a way that is understandable by both humans and computers. The variability in contributor behaviour creates reliability concerns for the tagging quality of OSM data. The detection of irregular contributions may improve OSM data quality and editing tools. This research applies the random forest algorithm on geospatial variables in order to detect outliers without ground-truth reference data to direct human inspection. An application to OSM data for Toronto, Ontario, was effective in revealing abnormal amenity tagging of school and hospital objects. 1. Introduction OpenStreetMap (OSM) is an online platform enabling registered volunteers to contribute geospatial data by digitizing point-, line-, or polygon-shaped geographic objects and annotating them with tags referring to common feature classes such as roads and restaurants (Haklay 2008). OSM tags are semantically structured as key-value pairs, where the key refers to a broad class of geographic objects and the value details the specific geographic object being tagged (Ballatore et al. 2013). Examples of tags are amenity=school, highway=residential, and building=house. The open and flexible nature of OSM tagging leads to varying contribution behaviour by different communities (Mooney et al. 2010). The varying contribution behaviour creates concerns about the quality of OSM data and the community standards of OSM tagging. Quality control and corrections rely heavily on human interaction, which raises additional questions on the reliability of OSM data. Finally, the experience of the volunteer contributor has an effect on the tagging quality of each geographic object as experienced contributors are more familiar with the tagging norms of the area being edited. Although OSM is an effective and efficient platform for generating masses of geospatial data, it is plagued by reliability, quality, and completeness issues. The aim of this paper is to examine the ability of an automated machine learning algorithm, the random forest algorithm, to support manual human inspection and minimize bias in OSM data editing. The use of an automated algorithm improves the detection of abnormal tagging behaviour, avoids the bias of human judgement, and reduces the time required to search through masses of tagged geographic objects. A combination of human knowledge and experience with the logical accuracy of machines could improve OSM tagging quality and standards, and enable the development of advanced editing tools. 2. Data and Methods

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Outlier Detection in OpenStreetMap Data using the Random Forest Algorithm

Abstract

Talk to us

Similar Papers

More From: International Conference on GIScience Short Paper Proceedings

Lead the way for us

Similar Papers

Research on multiscale OpenStreetMap in China: data quality assessment with EWM-TOPSIS and GDP modeling
Chuqiao Han ... Shudan Zheng
Geo-spatial Information Science | VOL. ahead-of-print
Chuqiao Han, et. al.Chuqiao Han ... Shudan Zheng
01 Jun 2024
Geo-spatial Information Science | VOL. ahead-of-print

Enhanced urban functional land use map with free and open-source data
T T Vu ... L D Nguyen
International Journal of Digital Earth | VOL. ahead-of-print
T T Vu, et. al.T T Vu ... L D Nguyen
27 Aug 2021
International Journal of Digital Earth | VOL. ahead-of-print

Exploring semantic elements for urban scene recognition: Deep integration of high-resolution imagery and OpenStreetMap (OSM)
Wenzhi Zhao ... William J Emery
ISPRS Journal of Photogrammetry and Remote Sensing | VOL. 151
Wenzhi Zhao, et. al.Wenzhi Zhao ... William J Emery
29 Mar 2019
ISPRS Journal of Photogrammetry and Remote Sensing | VOL. 151

Assessment of OpenStreetMap (OSM) Data: The Case of Abu Dhabi City, United Arab Emirates
M M Yagoub
Journal of Map & Geography Libraries | VOL. 13
M M YagoubM M Yagoub
02 Sep 2017
Journal of Map & Geography Libraries | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Outlier Detection in OpenStreetMap Data using the Random Forest Algorithm

Abstract

Talk to us

Similar Papers

More From: International Conference on GIScience Short Paper Proceedings