Topic Knowledge Based Controlled Generation for Long Documents Using Retrieval-Based Language Models

Xuefei Zhang,Tomal Deb,Peiyang He,Guang Yang,Xuefeng Liu,Ziqing Hu,Tianyi Mao

doi:10.3233/faia231087

Abstract

Current LLM summarization systems Produce broad overviews which are disconnected from people specific interests and expectations. Basically, people preferences (topics) can be expressed by a collection of semantic keywords. Previous work exploit these keywords as extra input to generate summary. That requires additional human annotations. To tackle these constraints, we propose a novel framework, Topic Knowledge based Controlled Generation (TKCG), to control generated summaries through a set of topic keywords that are extracted automatically from source documents. First, as large language models (LLMs) are limited by context window length, we need to split the documents into small pieces like chapters acccording to the document format, as one chapter is a semantically complete section. Secondly we extract some topic keywords from source documents with a transformer-based model. These topic keywords are used to retrieve the chapters that are related to the topic. We then input the combination of topic keywords and chapters as prompts into LLM to get conditional summaries. We also demonstrate the effectiveness of TKCG on two standard datasets, MACSum and arXiv.

Full Text