In this paper, we propose a language generation model for the world of ambient intelligence (AmI). Various devices in use today are connected to the Internet and are used to provide a considerable amount of information. Because language is the most effective way for humans to communicate with one another, one approach to controlling AmI devices is to use a smart assistant based on language systems. One such framework for data-to-text generation is the natural language generation (NLG) model that generates text from non-linguistic data. Previously proposed NLG models employed heuristic-based approaches to generate relatively short sentences. We find that such approaches are structurally inflexible and tend to generate text that is not diverse. Moreover, there are various domains where numerical values are important, such as sports, finance, and weather. These values need to be generated in terms of categorical information. (e.g., hits, homeruns, and strikeouts.) In the generated outputs, the numerical values often do not accurately correspond to categorical information. Our proposed data-to-text generation model provides both diversity and coherence of information through a narrative context and a copy mechanism. It allows for the learning of the narrative context and sentence structures from a domain corpus without requiring additional explanation of the intended category or sentential grammars. The results of experiments performed from various perspectives show that the proposed model generates text outputs containing diverse and coherent information.
Read full abstract