Sentence boundary detection of spontaneous Japanese using statistical language model and support vector machines

Yuya Akita,Hiroaki Nanjo,Masahiro Saikou,Tatsuya Kawahara

doi:10.21437/interspeech.2006-333

Abstract

Abstract This paper presents two different approaches utilizing sta-tisticallanguagemodel(SLM)andsupportvectormachines(SVM) for sentence boundary detection of spontaneousJapanese. In the SLM-based approach, linguistic likeli-hoods and occurrence of pause are used to determine sen-tence boundaries. To suppress false alarms, heuristic pat-terns of end-of-sentence expressions are also incorporated.On the other hand, SVM is adopted to realize robust classi-ﬁcation against a wide variety of expressions and speechrecognition errors. Detection is performed by an SVM-based text chunker using lexical and pause information asfeatures. We evaluated these approaches on manual and au-tomatic transcription of spontaneous lectures and speeches,and achieved F-measures of 0.85 and 0.78, respectively. Index Terms : sentence boundary detection, spontaneousspeech, statistical language model, support vector ma-chines. 1. Introduction Recent advance of automatic speech recognition (ASR)technology, especially for spontaneous speech, enables var-ious applications such as spoken document archiving andretrieval, speech summarization and speech translation. Toorganizeaspokendocumentinastructuredformandtogiveuseful indices, transcriptions should be segmented into ap-propriate units like sentences. Moreover, these applicationsare usually built by combining an ASR system with naturallanguage processing (NLP) systems such as a parser and amachine translator, which often assume that input text is asentence. However,sentencesinspontaneousspeechareill-formed, and sentence boundaries are indistinct. Output textby ASR systems is just a sequence of words and has no ex-plicitsentenceboundaries, sothefurtherstepofsegmentingthe ASR output is required for these applications.Automatic boundary detection of spoken sentences hasbeen explored mainly on broadcast news (BN)tasks[1,2,3]and conversational telephone speech (CTS) tasks[3, 4, 5] inEnglish. As features for detection, pause, prosodic and lin-guistic information is often used. Most popular approach isa combination of prosodic and linguistic information[2, 3],which realizes high performance on BN and CTS tasks.Prosody-based approaches[1, 5] have also been investi-gated. Meanwhile, linguistic information is not used by it-self, since most of these works were performed on Englishdata, where cue words or expressions of sentence bound-aries are not easily deﬁned.In Japanese, cue expressions are typically observed atthe end of sentences and expected to be useful for bound-ary detection. However, variety of such expressions is solarge in spoken Japanese, that it is hard to collect sufﬁcientamountofdatafortrainingstatisticalmodelssuchasamax-imum entropy (ME) model. Moreover, many of cue expres-sions consist of particles, which are apparently difﬁcult tobedetectedinASR.Thus,robustnessforASRerrorsshouldalso be investigated.In this paper, we address two approaches of sen-tence boundary detection for spontaneous Japanese. Asframeworks of detection, we adopt and compare statisti-cal language model (SLM) and support vector machines(SVM). The proposed approaches are evaluated with reallectures and speeches included in the Corpus of Sponta-neous Japanese (CSJ).

Full Text