Abstract

Geo-entity relation recognition from rich texts requires robust and effective solutions on keyword extraction. Compared with supervised learning methods, unsupervised learning methods attract more attention for their capability to capture the dynamic feature variation in text and to discover additional relation types. The frequency-based methods of keyword extraction have been widely studied. However, it is difficult to be applied into geo-entity keyword extraction directly because of the sparse distribution of geo-entity relations in texts. Besides, there are few studies on Chinese keyword extraction. This paper proposes a context enhanced keyword extraction method. Firstly the contexts for geo-entities are enhanced to reduce the sparseness of terms. Secondly two well-known frequency-based statistical methods (i.e., DF and Entropy) are used to build a large-scale corpus automatically from the enhanced contexts. Thirdly the lexical features and their weights are statistically determined based on the corpus to enhance the distinction of the terms. Finally, all terms in the enhanced contexts are measured with the lexical features, and the most important terms are selected as the keywords of geo-entity pairs. Experiments are conducted with mass real Chinese web texts. Compared with DF and Entropy, the presented method improves the precision by 41 % and 36 % respectively in discovering the keywords with sparse distribution and generates additional 60 % correct keywords for geo-entity relation recognition.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.