Abstract

Automatic text summarization is considered as an important task in various fields in natural language processing such as information retrieval. It is a process of automatically generating a text representation. Text summarization can be a solution to the problem of information overload. Hence, with the large amount of information available on the Internet, the presentation of a document by a summary helps to get the most relevant result of a search. We propose in this paper a new free Arabic structured corpus in the standard XML TREC format. ANT corpus v2.1 is collected using RSS feeds from different news sources. This corpus is useful for multiple text mining purposes such as generic text summarization, clustering or classification. We test this corpus for an unsupervised single-document extractive summarization using statistical and graph-based language-independent summarizers such as LexRank, TextRank, Luhn and LSA. We investigate the sensitivity of the summarization process to the stemming and stop words removal steps. We evaluate these summarizers performance by comparing the extracted texts fragments to the abstracts existing in ANT corpus v2.1 using ROUGE and BLEU metrics. Experimental results show that LexRank summarizer has achieved the best scores for the ROUGE metric using the stop words removal scenario.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.