Data set for Chinese text automatic generation task

Zhang You Zhang You,Lilin Lilin

doi:10.11922/sciencedb.j00001.00358

Abstract

The dataset is stored in the excel table format of CSV attribute, which mainly describes the information of the restaurant. It is composed of 17457 key value pairs and 17246 human language references. Each MR is composed of 3-8 Chinese key value pairs, such as name, food or region and their values, as shown in Table 3. Among them, 15568 texts were used for training, 1678 texts were used for verification, and the remaining 211 texts were used for testing. Each set of key value pairs in the training set and verification set has multiple human language reference texts, which aims to create more natural, informative and diverse human references than Mr. After a series of data processing, including collection, cleaning, translation, screening and sorting, the parallel corpus of Chinese key value pairs is finally constructed manually.The dataset includes three data files, including: (1) trainset CSV is the training set data, with a data volume of 15568 cases; (2) devset. CSV is the validation set data, with a data volume of 1678 cases; (3) testset. CSV is the test set data, with 211 cases of dataEach instance of training set and verification set consists of key value pair group and human reference text, and the instances of test set only have key value pair group.

Full Text