WikiGraphs: A Wikipedia Text - Knowledge Graph Paired Dataset

Luyu Wang,Oriol Vinyals,Yujia Li,Ozlem Aslan

doi:10.18653/v1/2021.textgraphs-1.7

Abstract

We present a new dataset of Wikipedia articles each paired with a knowledge graph, to facilitate the research in conditional text generation, graph generation and graph representation learning. Existing graph-text paired datasets typically contain small graphs and short text (1 or few sentences), thus limiting the capabilities of the models that can be learned on the data. Our new dataset WikiGraphs is collected by pairing each Wikipedia article from the established WikiText-103 benchmark (Merity et al., 2016) with a subgraph from the Freebase knowledge graph (Bollacker et al., 2008). This makes it easy to benchmark against other state-of-the-art text generative models that are capable of generating long paragraphs of coherent text. Both the graphs and the text data are of significantly larger scale compared to prior graph-text paired datasets. We present baseline graph neural network and transformer model results on our dataset for 3 tasks: graph -> text generation, graph -> text retrieval and text -> graph retrieval. We show that better conditioning on the graph provides gains in generation and retrieval quality but there is still large room for improvement.

Highlights

WikiText-103“Where the Streets Have No Name” is a song by Irish rock band U2
We present a new dataset of Wikipedia text articles each paired with a relevant knowledge graph (KG), which enables building models that can generate long text conditioned on a graph structured overview of relevant topics, and models that extract or generate graphs from a text description
The Gen-Our results show that better conditioning on the Wiki dataset (Jin et al, 2020) is automatically congraph improves the relevance of the gener- structed by querying KGs in DBpedia with the title ated text and the retrieval quality

Summary

Introduction

“Where the Streets Have No Name” is a song by Irish rock band U2. It is the opening track from their 1987 album The Joshua Tree and was released as the album’s third single in August 1987. Annotating KG or text to create paired datasets Graph neural networks (GNNs) (Battaglia et al, is expensive, as a good quality annotation requires 2018; Gilmer et al, 2017) learn representations annotators that understand the content and structure for graph structured data through a message passof the text and the corresponding KG The length of the text articles averages to 3,533.8 tokens and can go up to 26,994 tokens, which is orders of magnitudes longer than the text data in previous graph-text paired datasets that typically only contains a single or few sentences (Jin et al, 2020; Gardent et al, 2017; Lebret et al, 2016). Generation were based on the Transformer-XL architecture and conditioned on the graph through a GNN, making full use of the graph structure and capable of generating very long text comparable to the state-of-the-art

Dataset

50 No1d0e0s per15g0raph200 250

The dataset construction process

The graph part of the data should be relevant

Graph-conditioned Transformer-XL

Implementation details

Main result

Findings

Ablations on sampling configurations

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

WikiGraphs: A Wikipedia Text - Knowledge Graph Paired Dataset

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2021
Citations: 4	License type: cc-by

Similar Papers

WikiGraphs: A Wikipedia Text - Knowledge Graph Paired Dataset
Luyu Wang ... Oriol Vinyals
-
Luyu Wang, et. al.Luyu Wang ... Oriol Vinyals
01 Jan 2020
01 Jan 2020

Conditional Text Generation for Harmonious Human-Machine Interaction
Bin Guo ... Wei Wu
ACM Transactions on Intelligent Systems and Technology | VOL. 12
Bin Guo, et. al.Bin Guo ... Wei Wu
26 Feb 2021
ACM Transactions on Intelligent Systems and Technology | VOL. 12

TextDream: Conditional Text Generation by Searching in the Semantic Space
Weidi Xu ... Chao Deng
-
Weidi Xu, et. al.Weidi Xu ... Chao Deng
01 Jul 2018
01 Jul 2018

Special issue on knowledge graphs and semantics in text analysis and retrieval
Laura Dietz ... Jeff Dalton
Information Retrieval Journal | VOL. 22
Laura Dietz, et. al.Laura Dietz ... Jeff Dalton
04 Mar 2019
Information Retrieval Journal | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

WikiGraphs: A Wikipedia Text - Knowledge Graph Paired Dataset

Abstract

Highlights

Summary

Talk to us

Similar Papers