Abstract
A rule-based segmentation and broad classification algorithm [R. A. Cole and L. Hou, Proc. ICASSP 88, 453–456 (1988)] located over 95% of segments labeled [b], [d], [g], [p], [t], [k], [ch], [jh], [dr], and [q] (glottal stop) before sonorants in utterances of the DARPA TIMIT database. Artificial neural net (ANN) classifiers were trained to discriminate among the labels using perceptually motivated features. In one condition, 37 feature measurements were used to describe (a) the averaged spectrum during the 15 ms following the release burst, (b) zero crossings and peak-to-peak amplitude contours in the region of the segment, (c) the duration of the segment, and (d) the amplitude of the plosive burst. In a second condition, an additional 16 features were used to characterize the averaged spectrum during the first 30 ms of the sonorant following the plosive. The ANN classifiers consisted of either 37 or 53 input units, 30 hidden units, and 1 output unit for each category. Classifiers were trained using backpropagation and tested on 2000 segments provided by 20 speakers. With different amounts of training, classification accuracy was consistently 3%–5% better when vowel spectra were used, suggesting that ANN classifiers are able to learn coarticulatory relationships between consonant and vowel spectra. Classification accuracy was 70% for the 10 plosive consonants.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.