A novel similarity measure for spatial entity resolution based on data granularity model: Managing inconsistencies in place descriptions

Mohammad Khodizadeh-Nahari,Nasser Ghadiri,Jörg-Rüdiger Sack,Ahmad Baraani-Dastjerdi

doi:10.1007/s10489-020-01959-y

Abstract

Tremendous amounts of data are generated every day by different sources and stored in heterogeneous databases. Providing an integrated view by fusion of data is essential to enhance data utilization. An indispensable type of data is spatial data, with diverse application domains, including GIS, e-commerce, military, and tourism. The concept of location forms a key part of user-generated data with serious challenges, including uncertainty. A particular location may have different names, and conversely, various locations may have the same name. Furthermore, geographical coordinates of locations may not be expressed accurately in datasets. More challenges also exist that have received less attention. Various data sources might describe locations in different levels of detail. This increases data inconsistency and decreases the quality of data fusion. This paper focuses on spatial data granulation to deal with this variety. If these diversities are not taken into consideration, the different descriptions of a location may be interpreted differently and, in turn, not be fused. The contribution of this paper are: (a) Introducing a granular approach to measure the similarity between two place description for managing apparent differences. The proposed method improves the quality of the geocoding and data fusion phases, (b) Introducing a novel data blocking method to decrease pairwise comparisons based on geographical features. For result evaluation, we developed a dataset from two real aviation accident datasets. The evaluation shows that the quality of entity recognition and data fusion improved by using our proposed data granulation technique.

Full Text