Segmentation of speech signals by application of methods of hierarchical classification

P Breitkopf

doi:10.1121/1.2018067

Abstract

A multi‐level segmentation of a one‐dimensional signal may be induced by hierarchical ordering of subsets in a corresponding parameter space. This concept has been introduced to design a segmentation algorithm that creates a special three‐level segmentation for the speech signal. For preclassification it uses the parameters short time prediction gain and short time variance to form the second level of a segment hierarchy containing the classes “pause,” “fricative,” “vocal,” and “nasal oriented.” By merging the segment classes “pause” and “fricative” as well as “vocal” and “nasal oriented” the first level is formed. Since the vocal parts comprise more than 50% of speech, a clustering procedure has been added to create a third level containing four classes that roughly correspond to four different vowel classes. The parameter vector for the clustering algorithm is the sampled LPC‐generated log power spectrum together with the L2 distance. Five samples of speech, with a duration of 1 min each, have been processed. The resulting segmentation served as a basis for a number of segment length statistics which suggest applications in speaker verification and speech coding. [Work supported by VW Foundation, at the University of Hannover, Germany.]

Full Text