ACL TA-DA: A Dataset for Text Summarization and Generation

Min Su Park,Eunil Park

doi:10.1145/3555776.3577736

Abstract

Selecting appropriate natural language datasets is imperative to achieving good performance in deep learning natural language tasks. Recent state-of-the-art language models train huge corpora to achieving high language understanding performances. Also, to conduct diverse NLP tasks, fine-tuning pre-trained language models with task specific datasets is necessary. In this paper, we introduce ACL TA-DA (Association of Computational Linguistics Titles Abstracts DAta) consisting of 22k English titles and corresponding abstracts of papers published in ACL. Two NLP tasks, (1) text summarization and (2) text generation, are suitable tasks for our ACL TA-DA dataset. We train and report results from several state-of-the-art text summarization and generation models with our dataset to demonstrate that our dataset can be widely applied.

Full Text