Abstract

Geographical information becomes a kind of very important attribute for web documents, considering the fact that a large proportion of documents on the web contain geographical information. GIR (Geographical information retrieval) systems can identify those geographical information and extract the geographical focus in the documents automatically, hence supporting geo-related queries for information retrieval. Therefore, GIR has become a hot topic in both GIS and IR (Information Retrieval) areas recently. To take full advantage of geographical information within web documents in support of geo-related IR queries by returning more accurate results to users, a GIR system needs to get the geographical focus of the document, upon which a spatial index could then be established for a more accurate and efficient processing of spatial IR queries. So among all those steps within a GIR system, how to get the geographical focus for each document remains an essential one. In response to this demand, authors of this paper present a novel and promising algorithm. Before our explanation of proposed algorithm, we first briefly introduce SASEIC (Spatial-Aware Search Engine in Chinese)-a GIR prototype System we have implemented for the convenience of our research in GIR field. Then we start our description of proposed algorithm with the analysis of various possible PNPs (Place Name Patterns) within documents. After that, we present the algorithm with detailed principle and steps, which is conceived based on hierarchical structure of placenames within the documents for retrieval. Finally, at the end of this paper, we show the results of evaluation work for the proposed algorithm and draw our conclusions for this paper, as well as important directions of our future research.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call