Abstract

A rule-based segmentation and broad classification algorithm [R. A. Cole and L. Hou, Proc. ICASSP 88, 453–456 (1988)] located over 95% of segments labeled [b], [d], [g], [p], [t], [k], [ch], [jh], [dr], and [q] (glottal stop) before sonorants in utterances of the DARPA TIMIT database. Artificial neural net (ANN) classifiers were trained to discriminate among the labels using perceptually motivated features. In one condition, 37 feature measurements were used to describe (a) the averaged spectrum during the 15 ms following the release burst, (b) zero crossings and peak-to-peak amplitude contours in the region of the segment, (c) the duration of the segment, and (d) the amplitude of the plosive burst. In a second condition, an additional 16 features were used to characterize the averaged spectrum during the first 30 ms of the sonorant following the plosive. The ANN classifiers consisted of either 37 or 53 input units, 30 hidden units, and 1 output unit for each category. Classifiers were trained using backpropagation and tested on 2000 segments provided by 20 speakers. With different amounts of training, classification accuracy was consistently 3%–5% better when vowel spectra were used, suggesting that ANN classifiers are able to learn coarticulatory relationships between consonant and vowel spectra. Classification accuracy was 70% for the 10 plosive consonants.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call