Wikipedia Link Structure Research Articles

Many applications in cognitive science and artificial intelligence utilize semantic similarity and relatedness to solve difficult tasks such as information retrieval, word sense disambiguation, and text classification. Previously, several approaches for evaluating concept similarity and relatedness based on WordNet or Wikipedia have been proposed. WordNet-based methods rely on highly precise knowledge but have limited lexical coverage. In contrast, Wikipedia-based models achieve more coverage but sacrifice knowledge quality. Therefore, in this paper, we focus on developing a comprehensive semantic similarity and relatedness method based on WordNet and Wikipedia. To improve the accuracy of existing measures, we combine various taxonomic and non-taxonomic features of WordNet, including gloss, lemmas, examples, sister-terms, derivations, holonyms/meronyms, and hypernyms/hyponyms, with Wikipedia gloss and hyperlinks, to describe concepts. We present a novel technique for extracting ‘is-a’ and ‘part-whole’ relationships between concepts using the Wikipedia link structure. The suggested technique identifies taxonomic and non-taxonomic relationships between concepts and offers dense vector representations of concepts. To fully exploit WordNet and Wikipedia’s semantic attributes, the proposed method integrates their semantic knowledge at feature-level, combining semantic similarity and relatedness into a single comprehensive measure. The experimental results demonstrate the effectiveness of the proposed method over state-of-the-art measures on various gold standard benchmarks.

Read full abstract

Wikipedia links its articles by manually defined semantic relations called the Wikipedia hyperlink (link) structure. The existing Wikipedia link-based semantic similarity (SS) and semantic relatedness (SR) computation models, such as Wikipedia one-way link (WOLM) model and Wikipedia two-way link (WTLM) model, do not assess the strengths of the relationships between a candidate concept and its links (out-links or in-links). These models treat all the links as equally important even though some links are semantically more influential than others and should be given more importance. This phenomenon reduces the accuracy of these models. This paper presents the Wikipedia bi-linear link (WBLM) model that extends the previously proposed WOLM and WTLM models. The WBLM model explores the Wikipedia link structure as a semantic graph and discovers the strongly (bi-linear links) and weakly (out-links or in-links) connected links of a candidate concept. It improves the link-based vector representations of concepts by assigning weights to their connected links according to the strengths of their semantic associations. The experimental results demonstrate that the proposed WBLM model significantly improves the SS and SR computation accuracy of the WOLM model (6.9%, 8%, 24%, 17.3%, 31.2%, 30.6%, 26.5%, and 35.4%) and WTLM model (1.2%, 3.9%, 7.1%, 9.9%, 11%, 6.3%, 12.7%, and 13%), in terms of linear correlations with human judgments on gold standard benchmarks, including MC30, RG65, WS203, SimLex, 353All, MTurk287, MTurk771, and MEN3000, respectively. Moreover, this research offers a deep insight into the Wikipedia link structure and provides an adequate base for understanding it as a semantic graph.

Read full abstract

Wikipedia Link Structure Research Articles

Related Topics

Articles published on Wikipedia Link Structure

Evaluating semantic similarity and relatedness between concepts by combining taxonomic and non-taxonomic semantic features of WordNet and Wikipedia

Wikipedia bi-linear link (WBLM) model: A new approach for measuring semantic similarity and relatedness between linguistic concepts using Wikipedia link structure

Extracting Semantics from Random Walks on Wikipedia: Comparing Learning and Counting Methods

Research on Domain Term Dictionary Construction Based on Chinese Wikipedia

“Introducing Capisco: a semantically-enhanced search and discovery system for large-scale text corpora”

An Efficient Approach For Semantically-Enhanced Document Clustering By Using Wikipedia Link Structure

Wikipedia-based topic clustering for microblogs

Entity ranking in Wikipedia: utilising categories, links and topic difficulty prediction

Extraction of Bilingual Terminology from a Multilingual Web-based Encyclopedia

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Wikipedia Link Structure Research Articles

Related Topics

Articles published on Wikipedia Link Structure

Evaluating semantic similarity and relatedness between concepts by combining taxonomic and non-taxonomic semantic features of WordNet and Wikipedia

Wikipedia bi-linear link (WBLM) model: A new approach for measuring semantic similarity and relatedness between linguistic concepts using Wikipedia link structure

Extracting Semantics from Random Walks on Wikipedia: Comparing Learning and Counting Methods

Research on Domain Term Dictionary Construction Based on Chinese Wikipedia

“Introducing Capisco: a semantically-enhanced search and discovery system for large-scale text corpora”

An Efficient Approach For Semantically-Enhanced Document Clustering By Using Wikipedia Link Structure

Wikipedia-based topic clustering for microblogs

Entity ranking in Wikipedia: utilising categories, links and topic difficulty prediction

Extraction of Bilingual Terminology from a Multilingual Web-based Encyclopedia