Abstract

We present a new dataset of Wikipedia articles each paired with a knowledge graph, to facilitate the research in conditional text generation, graph generation and graph representation learning. Existing graph-text paired datasets typically contain small graphs and short text (1 or few sentences), thus limiting the capabilities of the models that can be learned on the data. Our new dataset WikiGraphs is collected by pairing each Wikipedia article from the established WikiText-103 benchmark (Merity et al., 2016) with a subgraph from the Freebase knowledge graph (Bollacker et al., 2008). This makes it easy to benchmark against other state-of-the-art text generative models that are capable of generating long paragraphs of coherent text. Both the graphs and the text data are of significantly larger scale compared to prior graph-text paired datasets. We present baseline graph neural network and transformer model results on our dataset for 3 tasks: graph -> text generation, graph -> text retrieval and text -> graph retrieval. We show that better conditioning on the graph provides gains in generation and retrieval quality but there is still large room for improvement.

Highlights

  • WikiText-103“Where the Streets Have No Name” is a song by Irish rock band U2

  • We present a new dataset of Wikipedia text articles each paired with a relevant knowledge graph (KG), which enables building models that can generate long text conditioned on a graph structured overview of relevant topics, and models that extract or generate graphs from a text description

  • The Gen-Our results show that better conditioning on the Wiki dataset (Jin et al, 2020) is automatically congraph improves the relevance of the gener- structed by querying KGs in DBpedia with the title ated text and the retrieval quality

Read more

Summary

Introduction

“Where the Streets Have No Name” is a song by Irish rock band U2. It is the opening track from their 1987 album The Joshua Tree and was released as the album’s third single in August 1987. Annotating KG or text to create paired datasets Graph neural networks (GNNs) (Battaglia et al, is expensive, as a good quality annotation requires 2018; Gilmer et al, 2017) learn representations annotators that understand the content and structure for graph structured data through a message passof the text and the corresponding KG The length of the text articles averages to 3,533.8 tokens and can go up to 26,994 tokens, which is orders of magnitudes longer than the text data in previous graph-text paired datasets that typically only contains a single or few sentences (Jin et al, 2020; Gardent et al, 2017; Lebret et al, 2016). Generation were based on the Transformer-XL architecture and conditioned on the graph through a GNN, making full use of the graph structure and capable of generating very long text comparable to the state-of-the-art

Dataset
50 No1d0e0s per15g0raph200 250
The dataset construction process
The graph part of the data should be relevant
Graph-conditioned Transformer-XL
Implementation details
Main result
Findings
Ablations on sampling configurations

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.