Building Compact Entity Embeddings Using Wikidata

Mohamed Lubani,Shahrul Azman Mohd Noah

doi:10.18517/ijaseit.8.4-2.6831

Abstract

Representing natural language sentences has always been a challenge in statistical language modelling. Atomic discrete representations of words make it difficult to represent semantically related sentences. Other sentence components such as phrases and named-entities should be recognized and given representations as units instead of individual words. Different entity senses should be assigned different representations regardless the fact that they share identical words. In this paper, we focus on building the vector representations (embeddings) of named-entities from their contexts to facilitate the task of ontology population where named-entities need to be recognized and disambiguated in natural language text. Given a list of target named-entities, Wikidata is used to compensate for the lack of a labelled corpus to build the contexts of all target named-entities as well as all their senses. Description text and semantic relations with other named-entities are considered when building the contexts from Wikidata. To avoid noisy and uninformative features in the embeddings generated from artificially built contexts, we propose a method to build compact entity representations to sharpen entity embeddings by removing irrelevant features and emphasizing the most descriptive ones. An extended version of the Continuous Bag-of-Words model (CBOW) is used to build the joint vector representations of words and named-entities using Wikidata contexts. Each entity context is then represented by a subset of elements that maximizes the chances of keeping the most descriptive features about the target entity. The final entity representations are built by compressing the embeddings of the chosen subset using a deep stacked autoencoders model. Cosine similarity and t-SNE visualisation technique are used to evaluate the final entity vectors. Results show that semantically related entities are clustered near each other in the vector space. Entities that appear in similar contexts are assigned similar compact vector representations based on their contexts.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Building Compact Entity Embeddings Using Wikidata

Abstract

Talk to us

Similar Papers

More From: International Journal on Advanced Science, Engineering and Information Technology

Lead the way for us

Journal: International Journal on Advanced Science, Engineering and Information Technology	Publication Date: Sep 26, 2018
License type: cc-by-sa

Similar Papers

ANALYSES OF THE DEPENDENT CLAUSE IN ASYNDETIC COMPOUND SENTENCES

ANGLISTICUM. Journal of the Association-Institute for English Language and American Studies | VOL. 8

02 Mar 2019
ANGLISTICUM. Journal of the Association-Institute for English Language and American Studies | VOL. 8

Geoscience language models and their intrinsic evaluation
Christopher J.M Lawley ... Geneviève Marquis
Applied Computing and Geosciences | VOL. 14
Christopher J.M Lawley, et. al.Christopher J.M Lawley ... Geneviève Marquis
04 May 2022
Applied Computing and Geosciences | VOL. 14

Embeddings in Natural Language Processing: Theory and Advances in Vector Representations of Meaning
Marcos Garcia
Computational Linguistics | VOL. 47
Marcos GarciaMarcos Garcia
03 Nov 2021
Computational Linguistics | VOL. 47

MLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models

-

09 Jul 2022
09 Jul 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Building Compact Entity Embeddings Using Wikidata

Abstract

Talk to us

Similar Papers

More From: International Journal on Advanced Science, Engineering and Information Technology