Abstract

Data-to-Text Generation (D2T) is one of the most important sub-fields of Natural Language Generation where structured data is transcribed into natural language text. Several solutions have been proposed for D2T so far with relative success, including template-based, phrase structure grammar-based, and neural attention models. However, these methods also have problems such as grammatical flaws, limited naturalness, and semantic deficiencies. In this work, we propose a stochastic corpus-based model for the data-to-text generation that produces a tree-form structure for sentences based on dependency information. This information includes the dependency relations between words and meaning labels extracted from the aligned training sentences parsed with a dependency parser. By combining the dependency relations and meaning labels to construct a tree structure in an up-down manner, each word is placed into the output sentence based on its preceding and succeeding words. This results in fluent sentences with correct grammatical structures. This approach also ensures that all required semantic information are present in the output sentences while irrelevant or redundant labels are avoided. In addition, by using beam search in producing the structure of sentences, the proposed model can generate highly diverse sentences. We test our model on eight domains in tabular, dialogue act, and RDF formats. Our model improves the BLEU by 30% compared to the corpus-based state-of-the-art methods trained on the tabular datasets and also achieves comparable results with the neural network-based approaches trained on dialogue act, E2E, and WebNLG datasets in the BLEU evaluation metric. Furthermore, the value of ERR metric for our results is always zero; that means our model generates sentences without losing any information. Human evaluations show that our model produces high-quality utterances in aspects of informativeness and naturalness as well as quality.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.