Abstract

This paper provides an overview of recent approaches to deep learning as applied to speech processing tasks, primarily for automatic speech recognition, but also text-to-speech and speaker, language and emotion recognition. The focus is on efficient methods, addressing issues of accuracy, computation, storage, and delay. The discussion puts the speech processing tasks in the broader context of pattern recognition, comparing with signals other than speech. It also compares machine learning with other recent methods of speech analysis, e.g., hidden Markov models. The paper emphasizes a thorough understanding of the choices made in analyzing and interpreting speech signals. It minimizes use of mathematics and is aimed at non-experts; the references provide needed detail for those interested.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call