Abstract

With the rapid growth of social media, textual content is increasingly growing. Unstructured texts are a rich source of latent spatial information. Extracting such information is useful in query processing, geographical information retrieval (GIR), and recommender systems. In this paper, we propose a novel approach to infer spatial information from salient features of non-spatial nature in text corpora. We propose two methods, namely DCS and RCS, to represent place-based concepts. In addition, two measures, namely the Shannon entropy and the Moran’s I, are proposed to calculate the degree of geo-indicativeness of terms in texts. The methodology is compared with a Latent Dirichlet Allocation (LDA) approach to estimate the accuracy improvement. We evaluated the methods on a dataset of rental property advertisements in Iran and a dataset of Persian Wikipedia articles. The results show that our proposed approach enhances the relative accuracy of predictions by about 10% in case of the renting advertisements and by 13% in case of the Wikipedia articles. The average distance error is about 13.3 km for the advertisements and 10.3 km for the Wikipedia articles, making the method suitable to infer the general region of the city in which a property is located. The proposed methodology is promising for inferring spatial knowledge from textual content that lacks spatial terms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call