Abstract

With the rapid increase in the amount of online text information, it became more important to have tools that would help users distinguish the important content. Automatic text summarization attempts to address this problem by taking an input text and extracting the most important content of it. However, the determination of the salience of information in the text depends on different factors and remains as a key problem of automatic text summarization. In the literature, there are some studies that use lexical chains as an indicator of lexical cohesion in the text and as an intermediate representation for text summarization. Also, some studies make use of genetic algorithms in order to examine some manually generated summaries and learn the patterns in the text which lead to the summaries by identifying relevant features which are most correlated with human generated summaries. In this study, we combine these two approaches of summarization. Firstly, some of preprocessing operations like normalizer, tokenizer, stop word remover, stemmer, and POS tagger are done on the text. After that for each sentence we have only semantic words that are independent. Then, by set of position, thematic, and coherence features we score sentences. The final score of each sentence will be the integration of those features. Each feature has its own weight and should be identified to have well summary. For this reason first system goes throw learning phase to determine ache feature weight by genetic algorithm. The next phase is testing phase. In this phase system receives new documents and uses Persian WordNet and lexical chains to extract deep level of knowledge about the text. This knowledge is combined with other higher level analysis results. Finally, sentences are scored, sorted, and selected and summary is made. We evaluated our proposed system by two methods. 1) Precision/recall, 2) TabEval (a new evaluation tool for Persian text summarizers). We compared our system with two other Persian summarizers (FarsiSum, Ijaz). Results showed that our system had higher performance rather than others (i.e. higher precision/recall average and the best average score of TabEval).

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.