Abstract

Measuring the semantic similarity between words is an important component in various tasks on the web such as relation extraction, community mining, document clustering, and automatic metadata extraction. Despite the usefulness of semantic similarity measures in these applications, accurately measuring semantic similarity between two words (or entities) remains a challenging task. This survey propose an empirical method to estimate semantic similarity using page counts and text snippets retrieved from a web search engine for two words. Specifically, this technique defines various word co-occurrence measures using page counts and integrates those with lexical patterns extracted from text snippets. To identify the numerous semantic relations that exist between two given words, a novel pattern extraction algorithm and a pattern clustering algorithm are proposed. The optimal combination of page counts-based co-occurrence measures and lexical pattern clusters is learned using support vector machines.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call