Automatic phrase segmentation and clustering in spontaneous speech

András Beke ,György Szaszák ,Viola Váradi

doi:10.1109/coginfocom.2013.6719290

Abstract

The aim of this research is to segment spontaneous speech using an unsupervised learning technique. We are especially interested from a machine perception or detection point-of-view, and focus on revealing some structure of prosody in spontaneous speech. The BEA spontaneous speech database is used to develop a speech segmentation system. The spontaneous narratives are annotated manually for intonational phrases (IP) and further divided for phonological phrases (PP). Word level transcription is also provided. For the automatic detection of IPs and embedded PPs, a two-step segmentation method is applied. In the first step, the IPs are detected automatically based on speech energy, spectral centroid and a double-thresholding technique. In the second step, PPs are segmented within the IPs, based on F0, energy and Kullback-Leibler divergence combined with an adaptive thresholding method. The results show that the proposed method can provide good and efficient framework for segmenting Hungarian spontaneous speech, with a performance close to read speech.

Full Text