Abstract

The performance of an auditory model has been compared with that of a conventional filterbank mel-cepstrum representation in speaker-dependent and speaker-independent spoken digit recognition tests. The model produces two outputs: one sensitive to voicing and onsets, and the other sensitive to formant structure and showing two-tone suppression. Linear discriminant analysis has been used to combine the outputs into eight coefficients. Undegraded, noisy and spectrally tilted male speech was tested with a quasi-isolated-word system. A subset of the tests were repeated with a connected-word system, and with undegraded female speech. In all cases the model performed better than the conventional representation. With degraded speech the differences were large. >

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call