4W1H Keyword Extraction based Summarization Model

Seungyeon Lee,Minho Lee,Taewon Park

doi:10.1109/iceic51217.2021.9369820

Abstract

In this internet era, with rapidly growing online information, there is a need for automatic summarization of textual documents from plethora of available information, making it an interesting area of research. Automatic keyword extraction and text summarization are Natural Language Processing (NLP) tasks for extracting relevant information from the large text documents. 4W1H (Who, When, Where, What, How) keywords are crucial for sentence generation. Despite the potential of 4W1H keywords, there have not been approaches that utilize the keywords in NLP tasks, particularly summarization. In this paper, we propose a new summarization method based on 4W1H keywords extraction which extracts the answer to a question corresponding to each event in QA format. We apply our methods to BERT and ELECTRA models to generate a summary, which are well-known pre-trained Language Models (LMs) in NLP domain, as a baseline. In experiments, our 4W1H keyword extraction method shows promising performance on AI Hub <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">**</sup> https://www.aihub.or.kr/aidata/86 Machine Reading Comprehension (MRC) dataset, recording an extraction performance of an F1-score as 84.93%. Moreover, we show the results of generating a rule-based summarization using keywords extracted with 4W1H.

Full Text