Non-Orthogonal Explicit Semantic Analysis

Nitish Aggarwal,Kartik Asooja,Georgeta Bordea,Paul Buitelaar

doi:10.18653/v1/s15-1010

Abstract

Explicit Semantic Analysis (ESA) utilizes the Wikipedia knowledge base to represent the semantics of a word by a vector where every dimension refers to an explicitly defined concept like a Wikipedia article. ESA inherently assumes that Wikipedia concepts are orthogonal to each other, therefore, it considers that two words are related only if they co-occur in the same articles. However, two words can be related to each other even if they appear separately in related articles rather than cooccurring in the same articles. This leads to a need for extending the ESA model to consider the relatedness between the explicit concepts (i.e. Wikipedia articles in Wikipedia based implementation) for computing textual relatedness. In this paper, we present NonOrthogonal ESA (NESA) which represents more fine grained semantics of a word as a vector of explicit concept dimensions, where every such concept dimension further constitutes a semantic vector built in another vector space. Thus, NESA considers the concept correlations in computing the relatedness between two words. We explore different approaches to compute the concept correlation weights, and compare these approaches with other existing methods. Furthermore, we evaluate our model NESA on several word relatedness benchmarks showing that it outperforms the state of the art methods.

Highlights

Significance of quantifying relatedness between two natural language texts has been shown in various tasks which deal with information retrieval (IR), natural language processing (NLP), or other related fields
In order to investigate the performance of these concept relatedness measures, we evaluate them on an entity relatedness benchmark called KORE (Hoffart et al, 2012) as Wikipedia article title generally refers to an entity
We presented Non-Orthogonal Explicit Semantic Analysis (ESA) which introduces the relatedness between the explicit concepts in the ESA model for computing semantic relatedness, without compromising with the explicit property of the ESA concept space

Summary

Introduction

Significance of quantifying relatedness between two natural language texts has been shown in various tasks which deal with information retrieval (IR), natural language processing (NLP), or other related fields. Distributional semantic models (DSMs) have achieved much attention as they utilize available document collections like Wikipedia, and do not depend upon human expertise (Harris, 1954). DSMs represent the semantics of a word by transforming it to a high dimensional distributional vector in a predefined concept space. Many models have been proposed that derive this concept space by using explicit concepts or implicit concepts. Explicit Semantic Analysis (ESA) (Gabrilovich and Markovitch, 2007) utilizes the concepts which are explicitly derived under human cognition like Wikipedia concepts (articles). Latent Semantic Analysis (LSA) derives a latent concept space by performing dimensionality reduction (Landauer et al, 1998). ESA represents the semantics of a word with a high dimensional vector over the Wikipedia concepts. Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics (*SEM 2015), pages 92–100, Denver, Colorado, June 4–5, 2015

History of association football

Text Relatedness

Concept Relatedness

Computing Concept Relatedness

VSM-Text

VSM-Hyperlink

ESA-WikiTitle

Evaluation of Concept Relatedness Measures

Dataset

Experiment

Results and Discussion

Evaluation of NESA for Word Relatedness

Conclusion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Non-Orthogonal Explicit Semantic Analysis

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2015
Citations: 27	License type: cc-by

Similar Papers

Boosting Explicit Semantic Analysis by Clustering Paragraph Vectors of Wikipedia Articles
Hai-Tao Zheng ... Wenzhen Wu
-
Hai-Tao Zheng, et. al.Hai-Tao Zheng ... Wenzhen Wu
01 Jan 2015
01 Jan 2015

A Bag of Concepts Approach for Biomedical Document Classification Using Wikipedia Knowledge*. Spanish-English Cross-language Case Study.
Luis E Anido-Rifón ... Marcos A Mouriño-García
Methods of information in medicine | VOL. 56
Luis E Anido-Rifón, et. al.Luis E Anido-Rifón ... Marcos A Mouriño-García
01 Jan 2017
Methods of information in medicine | VOL. 56

Computing semantic relatedness using word frequency and layout information of Wikipedia
Patrick Chan ... Shogo Nishida
-
Patrick Chan, et. al.Patrick Chan ... Shogo Nishida
18 Mar 2013
18 Mar 2013

Reducing explicit semantic representation vectors using Latent Dirichlet Allocation
Abdulgabbar Saif ... Nazlia Omar
Knowledge Based Systems | VOL. 100
Abdulgabbar Saif, et. al.Abdulgabbar Saif ... Nazlia Omar
10 Mar 2016
Knowledge Based Systems | VOL. 100

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Non-Orthogonal Explicit Semantic Analysis

Abstract

Highlights

Summary

Talk to us

Similar Papers