Abstract

Nowadays, a huge quantity of information is stored in digital format. A great portion of this information is constituted by textual and unstructured documents, where geographical references are usually given by means of place names. A common problem with textual information retrieval is represented by polysemous words, that is, words can have more than one sense. This problem is present also in the geographical domain: place names may refer to different locations in the world. In this paper we investigate the use of our word sense disambiguation technique in the geographical domain, with the aim of resolving ambiguous place names. Our technique is based on WordNet conceptual density. Due to the lack of a reference corpus tagged with WordNet senses, we carried out the experiments over a set of 1,210 place names extracted from the SemCor corpus that we named GeoSemCor and made publicly available. We compared our method with the most‐frequent baseline and the enhanced‐Lesk method, which previously has not been tested in large contexts. The results show that a better precision can be achieved by using a small context (phrase level), whereas a greater coverage can be obtained by using large contexts (document level). The proposed method should be tested with other corpora, due to the fact that our experiments evidenced the excessive bias towards the most‐frequent sense of the GeoSemCor.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.