Abstract

AbstractMissing data in Volunteered Geographic Information (VGI) are an unavoidable consequence of data collection by non‐experts, guided by only vague and informal mapping guidelines. While various Missing Value Imputation (MVI) techniques have been proposed as data cleansing strategies, they have primarily targeted numerical data attributes in non‐spatial databases. There remains a significant gap in methods for imputing nominal attribute values (e.g., Street Name) in map databases. Here, we present an imputation algorithm called the Membership Imputation Algorithm (MIA), targeting spatial databases and enabling imputation of nominal values in spatially referenced records. By targeting membership classes of spatial objects, MIA harnesses spatio‐temporal characteristics of data and proposes efficient heuristics to impute the class name (i.e., a membership). Experimental results show that the proposed algorithm is able to impute the membership with high levels of accuracy (over 94%) when assigning Street Name(s), across highly diverse regional contexts. MIA is effective in challenging spatial contexts such as street intersections. Our research serves as a first step in highlighting the effectiveness of spatio‐temporal measures as a key driver for nominal imputation techniques.

Highlights

  • Many real world data sets are dirty (Prasad et al, 2011)

  • We propose the Membership Imputation Algorithm (MIA), which imputes the nominal attributes of an OSM relation for any map feature, by evaluating the spatial and temporal proximity of the neighboring map features that already belong to an existing relation

  • Intersection entities are analyzed separately because the neighborhood of entities at intersections presents a unique challenge, due to neighbors being distributed across multiple Associated Street Relation (ASR) membership classes, in comparison to a neighborhood around a given street, a pattern dominating the overall data set

Read more

Summary

Introduction

Many real world data sets are dirty (Prasad et al, 2011). The term dirty data refers to data sets with issues such as missing or incorrect records or values (Simoudis, Livezey, & Kerber, 1995), non-standard representations (Williams, 1997), outliers (Hawkins, He, Williams, & Baxter, 2002), and duplicate values (Hernández & Stolfo, 1998). OpenStreetMap (OSM) (https://www.openstreetmap.org.), the most prominent VGI data source, is heavily impacted by map features with incomplete attribute data (Davidovic, Mooney, Stoimenov, & Minghini, 2016). This is a general issue prominent in databases without a strict schema or data definition rules. In OSM, the free tagging system allows the contributors to use an unlimited number of attributes to describe a map feature This free-form nature of tagging, coupled with a lack of adherence to community guidelines (https://wiki.openstreetmap.org/wiki/Tagging.), results in considerable missing data for features

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call