Abstract

This chapter proposes a novel feature extraction technique based on the two-dimensional discrete cosine transform (DCT) of the spectrogram. This is in contrast to conventional approaches based on single dimension analysis. To demonstrate the novel approach, two tasks of word and phoneme recognition were conducted. The word recognition was carried out as a preliminary study. A small database of 30 names spoken by 15 speakers was selected. As a phoneme recognition task, a series of experiments were conducted on the voice stops of the TIMIT database uttered by 630 speakers. The extracted data form the basis for input patterns for training two types of neural networks: (1) the semi-dynamic network time-delay neural network (TDNN) and (2) a static network multilayer perceptron (MLP). For word recognition task, recognition of 86% was achieved for 7 names using TDNN. However, for the phoneme recognition task, the highest recognition rates of 77.5 and 72.4% were recorded for TDNN and MLP respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call