Abstract

Purpose – Topic segmentation is one of the active research fields in natural language processing. Also, many topic segmenters have been proposed. However, the current challenge of researchers is the improvement of these segmenters by using external resources. Therefore, the purpose of this paper is to integrate study and evaluate a new external semantic resource in topic segmentation. Design/methodology/approach – New topic segmenters (TSS-Onto and TSB-Onto) are proposed based on the two well-known segmenters C99 and TextTiling. The proposed segmenters integrate semantic knowledge to the segmentation process by using a domain ontology as an external resource. Subsequently, an evaluation is made to study the effect of this resource on the quality of topic segmentation along with a comparative study with related works. Findings – Based on this study, the authors showed that adding semantic knowledge, which is extracted from a domain ontology, improves the quality of topic segmentation. Moreover, TSS-Ont outperforms TSB-Ont in terms of quality of topic segmentation. Research limitations/implications – The main limitation of this study is the used test corpus for the evaluation which is not a benchmark. However, we used a collection of scientific papers from well-known digital libraries (ArXiv and ACM). Practical implications – The proposed topic segmenters can be useful in different NLP applications such as information retrieval and text summarizing. Originality/value – The primary original contribution of this paper is the improvement of topic segmentation based on semantic knowledge. This knowledge is extracted from an ontological external resource.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.