A Review on Automatic Text Summarization Approaches

Yogan Jaya Kumar,Halizah Basiron,Ong Sing Goh,Ngo Hea Choon,Puspalata C Suppiah

doi:10.3844/jcssp.2016.178.190

Abstract

It has been more than 50 years since the initial investigation on automatic text summarization was started. Various techniques have been successfully used to extract the important contents from text document to represent document summary. In this study, we review some of the studies that have been conducted in this still-developing research area. It covers the basics of text summarization, the types of summarization, the methods that have been used and some areas in which text summarization has been applied. Furthermore, this paper also reviews the significant efforts which have been put in studies concerning sentence extraction, domain specific summarization and multi document summarization and provides the theoretical explanation and the fundamental concepts related to it. In addition, the advantages and limitations concerning the approaches commonly used for text summarization are also highlighted in this study.

Highlights

It has been more than 50 years since Luhn started his initial investigation on automatic text summarization (Luhn, 1958)
The fundamental concepts and methods related to automatic text summarization have been discussed
This study has been presented in a way that researchers new to this field are exposed to various automatic text summarization approaches and applications

Summary

Introduction

It has been more than 50 years since Luhn started his initial investigation on automatic text summarization (Luhn, 1958). Automatic text summarization systems can be categorized into several different types (Nenkova and McKeown, 2012; Saggion and Poibeau, 2013). The different dimensions of text summarization can be generally categorized based on its input type (single or multi document), purpose (generic, domain specific, or query-based) and output type (extractive or abstractive). Summarizing finance articles, biomedical documents, weather news, terrorist events and many more (Radev and McKeown, 1998; Verma et al, 2007; Wu and Liu, 2003) Often, this type of summarization requires domain specific knowledge bases to assist its sentence selection process. Many of the summarization systems use frequency based approaches in their sentence extraction process (Klassen, 2012). Two techniques that use frequency as a basic form of measure in text summarization are: word probability and term frequency-inverse document frequency

Word Probability

Term Frequency–Inverse Document Frequency

Naive Bayes

Neural Network

Conclusion