Abstract

An automatic system for classifying the English stops [b, d, g, p, t, k] which uses a preprocessing technique based on a modified Rasta-PLP algorithm and a classification algorithm based on a simplified Time Delay Neural Network (TDNN) architecture is proposed. Phonemes, extracted from the TIMIT-NIST database, and produced by 73 speakers were used to train and test the system. The work is intended to study three different aspects of the problem: First, what role play the the preprocessing phase in the performances of the net? Second, what is the optimal number of neurons which balance the trade-off between net performance and computational time? Third, the optimal learning rate must be found through trial and error processes or can be found as a function of the input data? To this aim experiments to tune the preprocessing parameters, the optimal number of hidden neurons in the TDNN, and the learning rate have been performed. Classification percentages on the test data equal to 92.9 for [b], 91.8 for [d], 92.4 for [g], 80.3 for [p], 90.8 for [t], and 94.2 for [k] have been achieved.KeywordsHide LayerMean Square ErrorLearning RateSpeech SignalHide NeuronThese keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call