Abstract

Similarity plays an important role in many data mining tasks and information retrieval processes. Most of the supervised, semi-supervised, and unsupervised learning algorithms depend on using a dissimilarity function that measures the pair-wise similarity between the objects within the dataset. However, traditionally most of the similarity functions fail to adequately treat all the spatial attributes of the geospatial polygons due to the incomplete quantitative representation of structural and topological information contained within the polygonal datasets. In this paper, we propose a new dissimilarity function known as the polygonal dissimilarity function (PDF) that comprehensively integrates both the spatial and the non-spatial attributes of a polygon to specifically consider the density, distribution, and topological relationships that exist within the polygonal datasets. We represent a polygon as a set of intrinsic spatial attributes, extrinsic spatial attributes, and non-spatial attributes. Using this representation of the polygons, PDF is defined as a weighted function of the distance between two polygons in the different attribute spaces. In order to evaluate our dissimilarity function, we compare and contrast it with other distance functions proposed in the literature that work with both spatial and non-spatial attributes. In addition, we specifically investigate the effectiveness of our dissimilarity function in a clustering application using a partitional clustering technique (e.g. $$k$$ k -medoids) using two characteristically different sets of data: (a) Irregular geometric shapes determined by natural processes, i.e., watersheds and (b) semi-regular geometric shapes determined by human experts, i.e., counties.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call