Subtopic Segmentation for Small Corpus Using a Novel Fuzzy Model

Tao-Hsing Chang,Chia-Hoang Lee

doi:10.1109/tfuzz.2006.889911

Abstract

Subtopic segmentation is a critical task in numerous applications, including information retrieval, automatic summarization, essay scoring, and others. Although several approaches have been developed, many are ineffective for specific domains with a small corpus because of the fuzziness of the semantics of words and sentences in the corpus. This paper explores the problem of subtopic segmentation by proposing a fuzzy model for the semantics of both words and sentences. The model has three characteristics. First, it can deal with the uncertainty in the semantics of words and sentences. Secondly, it can measure the fuzzy similarity between the fuzzy semantics of sentences. Thirdly, it can develop a fuzzy algorithm for segmenting a text into several subtopic segments. The experiments, especially for a short text with a small corpus in a specific domain, indicate that the method can efficiently increase the accuracy of subtopic segmentation over previous methods.

Full Text