Nasari: Integrating explicit knowledge and corpus statistics for a multilingual representation of concepts and entities

José Camacho-Collados,Mohammad Taher Pilehvar,Roberto Navigli

doi:10.1016/j.artint.2016.07.005

José Camacho-Collados, Mohammad Taher Pilehvar + Show 1 more

Open Access

https://doi.org/10.1016/j.artint.2016.07.005

Copy DOI

Abstract

Owing to the need for a deep understanding of linguistic items, semantic representation is considered to be one of the fundamental components of several applications in Natural Language Processing and Artificial Intelligence. As a result, semantic representation has been one of the prominent research areas in lexical semantics over the past decades. However, due mainly to the lack of large sense-annotated corpora, most existing representation techniques are limited to the lexical level and thus cannot be effectively applied to individual word senses. In this paper we put forward a novel multilingual vector representation, called Nasari, which not only enables accurate representation of word senses in different languages, but it also provides two main advantages over existing approaches: (1) high coverage, including both concepts and named entities, (2) comparability across languages and linguistic levels (i.e., words, senses and concepts), thanks to the representation of linguistic items in a single unified semantic space and in a joint embedded space, respectively. Moreover, our representations are flexible, can be applied to multiple applications and are freely available at http://lcl.uniroma1.it/nasari/. As evaluation benchmark, we opted for four different tasks, namely, word similarity, sense clustering, domain labeling, and Word Sense Disambiguation, for each of which we report state-of-the-art performance on several standard datasets across different languages.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Artificial Intelligence	Publication Date: Aug 16, 2016
Citations: 182	License type: publisher-specific-oa

R Discovery Prime

R Discovery Prime

Nasari: Integrating explicit knowledge and corpus statistics for a multilingual representation of concepts and entities

Abstract

Talk to us

Similar Papers

More From: Artificial Intelligence

Lead the way for us

Similar Papers

A Unified Multilingual Semantic Representation of Concepts
José Camacho-Collados ... Mohammad Taher Pilehvar
-
José Camacho-Collados, et. al.José Camacho-Collados ... Mohammad Taher Pilehvar
01 Jan 2015
01 Jan 2015

Unsupervised Approach to Word Sense Disambiguation in Malayalam
K.P Sruthi Sankar ... V Jayan
Procedia Technology | VOL. 24
K.P Sruthi Sankar, et. al.K.P Sruthi Sankar ... V Jayan
01 Jan 2015
Procedia Technology | VOL. 24

Attention-based Stacked Bidirectional Long Short-term Memory Model for Word Sense Disambiguation
Yujia Sun ... Jan Platoš
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. -
Yujia Sun, et. al.Yujia Sun ... Jan Platoš
18 May 2023
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. -

An approach to reduce part of speech ambiguity using semantically annotated lexicon definitions
Andrei Minca ... Stefan Diaconescu
-
Andrei Minca, et. al.Andrei Minca ... Stefan Diaconescu
01 Sep 2012
01 Sep 2012

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Nasari: Integrating explicit knowledge and corpus statistics for a multilingual representation of concepts and entities

Abstract

Talk to us

Similar Papers

More From: Artificial Intelligence