Abstract

The authors propose a method for capturing speaker-invariant features from speech spectrograms using artificial neural network (ANN) models. Feature extraction was carried out using the recognition network model, a biologically based pattern recognition system capable of recognizing images that are distorted or shifted in position. It is a three-layer network system. The initial layer of the network extracts small features; each advancing layer looks for larger and larger features. As pattern information progresses through the network, slight distortions and shifts in position are allowed. The proposed network model was used to learn vowel features from six different vowel and diphthong sounds in English. Initial test results show that the network model is capable of learning all important features that are present in the pattern studied. >

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call