Learning Distributed Representations of Texts and Entities from Knowledge Base

Ikuya Yamada,Yoshiyasu Takefuji,Hideaki Takeda,Hiroyuki Shindo

doi:10.1162/tacl_a_00069

Abstract

We describe a neural network model that jointly learns distributed representations of texts and knowledge base (KB) entities. Given a text in the KB, we train our proposed model to predict entities that are relevant to the text. Our model is designed to be generic with the ability to address various NLP tasks with ease. We train the model using a large corpus of texts and their entity annotations extracted from Wikipedia. We evaluated the model on three important NLP tasks (i.e., sentence textual similarity, entity linking, and factoid question answering) involving both unsupervised and supervised settings. As a result, we achieved state-of-the-art results on all three of these tasks. Our code and trained models are publicly available for further academic research.

Highlights

Methods capable of learning distributed representations of arbitrary-length texts, such as sentences and paragraphs, have recently attracted considerable attention (Le and Mikolov, 2014; Kiros et al, 2015; Li et al, 2015; Wieting et al, 2016; Hill et al, 2016b; Kenter et al, 2016)
Our Neural Text-Entity Encoder (NTEE) models were able to outperform the state-of-the-art models in all datasets in terms of Pearson’s r
We tested the performance of the NTEE model without using the string similarity features and found that these features contributed to the performance

Summary

Introduction

Methods capable of learning distributed representations of arbitrary-length texts (i.e., fixed-length continuous vectors that encode the semantics of texts), such as sentences and paragraphs, have recently attracted considerable attention (Le and Mikolov, 2014; Kiros et al, 2015; Li et al, 2015; Wieting et al, 2016; Hill et al, 2016b; Kenter et al, 2016) These methods aim to learn generic representations that are useful across domains similar to word embedding methods such as Word2vec (Mikolov et al, 2013b) and GloVe (Pennington et al, 2014). 1https://github.com/studio-ousia/ntee base (KB) such as Wikipedia and Freebase These methods encode information of entities in the KB into a continuous vector space. We use humanedited entity annotations obtained from Wikipedia (see Table 1) as supervised data of relevant entities to the texts containing these annotations.

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Transactions of the Association for Computational Linguistics	Publication Date: Dec 1, 2017
Citations: 128	License type: cc-by

R Discovery Prime

R Discovery Prime

Learning Distributed Representations of Texts and Entities from Knowledge Base

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Transactions of the Association for Computational Linguistics

Lead the way for us

Similar Papers

GraphMemDialog: Optimizing End-to-End Task-Oriented Dialog Systems Using Graph Memory Networks
Jie Wu ... Hongzhi Zhao
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 36
Jie Wu, et. al.Jie Wu ... Hongzhi Zhao
28 Jun 2022
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 36

Ranking Entities for Web Queries Through Text and Knowledge
Michael Schuhmacher ... Laura Dietz
-
Michael Schuhmacher, et. al.Michael Schuhmacher ... Laura Dietz
17 Oct 2015
17 Oct 2015

Semantic Network Analysis Pipeline—Interactive Text Mining Framework for Exploration of Semantic Flows in Large Corpus of Text
Martin Cenek ... Levi Oyster
Applied Sciences | VOL. 9
Martin Cenek, et. al.Martin Cenek ... Levi Oyster
05 Dec 2019
Applied Sciences | VOL. 9

In search of the Why
Suzan Verberne
ACM SIGIR Forum | VOL. 44
Suzan VerberneSuzan Verberne
03 Jan 2011
ACM SIGIR Forum | VOL. 44

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Learning Distributed Representations of Texts and Entities from Knowledge Base

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Transactions of the Association for Computational Linguistics