Abstract

Natural Language Generation (NLG) is the task of automatically producing natural language text to describe information present in non-linguistic data. It involves three main subtasks: (i) selecting the relevant portion of input data; (ii) determining the words that will be used to verbalise the selected data; and (iii) mapping these words into natural language text. The latter task is known as Surface Realisation (SR). In my thesis, I study the SR task in the context of input data coming from Knowledge Bases (KB). I present two novel approaches to surface realisation from knowledge bases: a supervised approach and a weakly supervised approach. In the first, supervised, approach, I present a corpus-based method for inducing a Feature Based Lexicalized Tree Adjoining Grammar (FB-LTAG) from a parallel corpus of text and data. The resulting grammar includes a unification based semantics and can be used by an existing surface realiser to generate sentences from test data. I show that the induced grammar is compact and generalises well over the test data yielding results that are close to those produced by a handcrafted symbolic approach and which outperform an alternative statistical approach. In the weakly supervised approach, I explore a method for surface realisation from KB data which uses a supplied lexicon but does not require a parallel corpus. Instead, I build a corpus from heterogeneous sources of domain-related text and use it to identify possible lexicalisations of KB symbols (classes and relations) and their verbalisation patterns (frames). Based on the observations made, I build different probabilistic models which are used for selection of appropriate frames and syntax/semantics linking while verbalising KB inputs. I evaluate the output sentences and analyse the issues relevant to learning from non-parallel corpora. In both these approaches, I use the data derived from an existing biomedical ontology as a reference input. The proposed methods are generic and can be easily adapted for input from other ontologies for which a parallel/non-parallel corpora exists.

Highlights

  • 1.1 Génération automatique de langue naturelle et réalisation de surface(RS)La Génération Automatique de Langue Naturelle (GLN) peut être définie comme la tâche qui consiste ) produire un texte en langue naturelle à partir d’informations codées dans un système de représentation machine

  • We present our approaches using a sample input derived from an existing biomedical ontology; the approaches are generic and can be adapted to other ontologies

  • We have presented a supervised approach to grammar based generation from Knowledge base grammar we have induced (Base)

Read more

Summary

Introduction

1.1 Génération automatique de langue naturelle et réalisation de surface(RS)La Génération Automatique de Langue Naturelle (GLN) peut être définie comme la tâche qui consiste ) produire un texte en langue naturelle à partir d’informations codées dans un système de représentation machine (par exemple: les bases de données, les bases de connaissances, les formules logiques, etc.). [Angeli et al, Chen and Mooney, Wong and Mooney, Konstas and Lapata, Konstas and Lapata, 2010, 2008, 2007, 2012b, 2012a] trained and developed data-to-text generators on datasets from various domains including the air travel domain [Dahl et al, 1994], weather forecasts [Liang et al, Belz, 2009, 2008] and sportscasting [Chen and Mooney, 2008] In both cases, considerable time and expertise must be spent on developing the required linguistic resources. Symbolic approach, appropriate grammars and lexicons must be specified while in the supervised approach, an aligned data-text corpus must be built for each new domain To overcome this shortcoming, we propose an alternative, a weakly supervised approach to surface realisation from knowledge bases which could be used for any knowledge base for which there exists large textual corpora.

Objectives
Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.