Abstract
We present a new dataset of Wikipedia articles each paired with a knowledge graph, to facilitate the research in conditional text generation, graph generation and graph representation learning. Existing graph-text paired datasets typically contain small graphs and short text (1 or few sentences), thus limiting the capabilities of the models that can be learned on the data. Our new dataset WikiGraphs is collected by pairing each Wikipedia article from the established WikiText-103 benchmark (Merity et al., 2016) with a subgraph from the Freebase knowledge graph (Bollacker et al., 2008). This makes it easy to benchmark against other state-of-the-art text generative models that are capable of generating long paragraphs of coherent text. Both the graphs and the text data are of significantly larger scale compared to prior graph-text paired datasets. We present baseline graph neural network and transformer model results on our dataset for 3 tasks: graph -> text generation, graph -> text retrieval and text -> graph retrieval. We show that better conditioning on the graph provides gains in generation and retrieval quality but there is still large room for improvement.
Highlights
WikiText-103“Where the Streets Have No Name” is a song by Irish rock band U2
We present a new dataset of Wikipedia text articles each paired with a relevant knowledge graph (KG), which enables building models that can generate long text conditioned on a graph structured overview of relevant topics, and models that extract or generate graphs from a text description
The Gen-Our results show that better conditioning on the Wiki dataset (Jin et al, 2020) is automatically congraph improves the relevance of the gener- structed by querying KGs in DBpedia with the title ated text and the retrieval quality
Summary
“Where the Streets Have No Name” is a song by Irish rock band U2. It is the opening track from their 1987 album The Joshua Tree and was released as the album’s third single in August 1987. Annotating KG or text to create paired datasets Graph neural networks (GNNs) (Battaglia et al, is expensive, as a good quality annotation requires 2018; Gilmer et al, 2017) learn representations annotators that understand the content and structure for graph structured data through a message passof the text and the corresponding KG The length of the text articles averages to 3,533.8 tokens and can go up to 26,994 tokens, which is orders of magnitudes longer than the text data in previous graph-text paired datasets that typically only contains a single or few sentences (Jin et al, 2020; Gardent et al, 2017; Lebret et al, 2016). Generation were based on the Transformer-XL architecture and conditioned on the graph through a GNN, making full use of the graph structure and capable of generating very long text comparable to the state-of-the-art
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.