We present a hybrid approach to the problem of Arabic text summarization. Our approach focuses on segment extraction and ranking using heuristic methods that assign weighted scores to segments of text. Also, we use a text categorization system and the Arabic WordNet to identify the thematic structure of the input text in order to select the most relevant sentences obtained from the statistical analysis process. We use a tokenizer, a stemmer and other statistical tools borrowed from traditional information retrieval to identify relevant segments in the text. The source document is segmented into its major units (title, paragraphs and lines) and then, text-lines are interpreted to extract relevant segments for inclusion in the summary. The summarization system was tested by 1200 human evaluators, who were each given a copy of a newspaper article and a system-generated summary and asked to classify them as rejected, not-related, satisfactory, good, or accepted. 76.92% of the summaries were judged t...
Read full abstract