Topical Segmentation Research Articles

It is necessary to determine a proper arrangement of extracted sentences to generate a well-organized summary from multiple documents. This paper describes our Multi-Document Summarization (MDS) system for TSC-3. It specifically addresses an approach to coherent sentence ordering for MDS. An impediment to the use of chronological ordering, which is widely used by conventional summarization system, is that it arranges sentences without considering the presupposed information of each sentence. We propose a method to improve chronological ordering by resolving precedent information of arranging sentences. Combining the refinement algorithm with topical segmentation and chronological ordering, we address our experiments and metrics to test the effectiveness of MDS tasks. Results demonstrate that the proposed method significantly improves chronological sentence ordering. At the end of the paper, we also report an outline/evaluation of important sentence extraction and redundant clause elimination integrated in our MDS system.

Much is known about the design of automated systems to search broadcast news, but it has only recently become possible to apply similar techniques to large collections of spontaneous speech. This paper presents initial results from experiments with speech recognition, topic segmentation, topic categorization, and named entity detection using a large collection of recorded oral histories. The work leverages a massive manual annotation effort on 10 000 h of spontaneous speech to evaluate the degree to which automatic speech recognition (ASR)-based segmentation and categorization techniques can be adapted to approximate decisions made by human annotators. ASR word error rates near 40% were achieved for both English and Czech for heavily accented, emotional and elderly spontaneous speech based on 65-84 h of transcribed speech. Topical segmentation based on shifts in the recognized English vocabulary resulted in 80% agreement with manually annotated boundary positions at a 0.35 false alarm rate. Categorization was considerably more challenging, with a nearest-neighbor technique yielding F=0.3. This is less than half the value obtained by the same technique on a standard newswire categorization benchmark, but replication on human-transcribed interviews showed that ASR errors explain little of that difference. The paper concludes with a description of how these capabilities could be used together to search large collections of recorded oral histories.

Topical Segmentation Research Articles

Related Topics

Articles published on Topical Segmentation

Intended boundaries detection in topic change tracking for text segmentation

Improving chronological ordering of sentences extracted from multiple newspaper articles

Automatic Recognition of Spontaneous Speech for Access to Multilingual Oral History Archives

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Topical Segmentation Research Articles

Related Topics

Articles published on Topical Segmentation

Intended boundaries detection in topic change tracking for text segmentation

Improving chronological ordering of sentences extracted from multiple newspaper articles

Automatic Recognition of Spontaneous Speech for Access to Multilingual Oral History Archives