Abstract

Topic segmentation is important for many natural language processing applications such as information retrieval, text summarization. In our work, we are interested in the topic segmentation of textual document. We present a survey of related works particularly C99 and TextTiling. Then, we propose an adaptation of these topic segmenters for textual document written in Arabic language named as ArabC99 and ArabTextTiling. For experimental results, we construct an Arabic corpus based on newspapers of different Arab countries. Finally, we evaluate the performance of these new segmenters by comparing them together and to related works using the metrics WindowDiff and F-measure.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call