Recognition and Processing of Speech Signals Using Neural Networks

Douglas O’Shaughnessy

doi:10.1007/s00034-019-01081-6

Abstract

This paper provides an overview of recent approaches to deep learning as applied to speech processing tasks, primarily for automatic speech recognition, but also text-to-speech and speaker, language and emotion recognition. The focus is on efficient methods, addressing issues of accuracy, computation, storage, and delay. The discussion puts the speech processing tasks in the broader context of pattern recognition, comparing with signals other than speech. It also compares machine learning with other recent methods of speech analysis, e.g., hidden Markov models. The paper emphasizes a thorough understanding of the choices made in analyzing and interpreting speech signals. It minimizes use of mathematics and is aimed at non-experts; the references provide needed detail for those interested.

Full Text