Abstract

While artificial neural networks have been in existence for over half a century, it was not until year 2010 that they had made a significant impact on speech recognition with a deep form of such networks. This invited paper, based on my keynote talk given at Interspeech conference in Singapore in September 2014, will first reflect on the historical path to this transformative success, after providing brief reviews of earlier studies on (shallow) neural networks and on (deep) generative models relevant to the introduction of deep neural networks (DNN) to speech recognition several years ago. The role of well-timed academic-industrial collaboration is highlighted, so are the advances of big data, big compute, and the seamless integration between the application-domain knowledge of speech and general principles of deep learning. Then, an overview is given on sweeping achievements of deep learning in speech recognition since its initial success. Such achievements, summarized into six major areas in this article, have resulted in across-the-board, industry-wide deployment of deep learning in speech recognition systems. Next, more challenging applications of deep learning, natural language and multimodal processing, are selectively reviewed and analyzed. Examples include machine translation, knowledgebase completion, information retrieval, and automatic image captioning, where fresh ideas from deep learning, continuous-space embedding in particular, are shown to be revolutionizing these application areas albeit with less rapid pace than for speech and image recognition. Finally, a number of key issues in deep learning are discussed, and future directions are analyzed for perceptual tasks such as speech, image, and video, as well as for cognitive tasks involving natural language.

Highlights

  • The main theme of this paper is to reflect on the recent history of how deep learning has profoundly revolutionized the field of automatic speech recognition (ASR) and to elaborate on what kind of lessons we can learn to further advance ASR technology and to impact the related, arguably more important, applications in language and multimodal processing

  • The roles of generative models have been analyzed in the review, pointing out that the key advantages of embedding knowledge about speech dynamics that are naturally enabled by deep generative modeling have yet to be incorporated as part of the new-generation deep learning framework

  • One remaining future challenge lies in how to effectively integrate major relevant speech knowledge and problem constraints into new deep models of the future. Examples of such knowledge and constraints would include distributed, feature-based phonological representations of sound patterns of language via hierarchical structure based on modern phonology, articulatory dynamics, and motor program control, acoustic distortion mechanisms for the generation of noisy, reverberant speech in multi-speaker environments, Lombard effects caused by modification of articulatory behavior due to noise-induced reduction of communication effectiveness, and so on

Read more

Summary

INTRODUCTION

The main theme of this paper is to reflect on the recent history of how deep learning has profoundly revolutionized the field of automatic speech recognition (ASR) and to elaborate on what kind of lessons we can learn to further advance ASR technology and to impact the related, arguably more important, applications in language and multimodal processing. Semantic analysis of language and multimodal processing involving speech, text, and image, both experiencing rapid advances based on deep learning over the past few years, holds the potential to solve some difficult and remaining ASR problems and present new challenges for the deep learning technology.

SOME BRIEF HISTORY OF “DEEP” SPEECH RECOGNITION
ACHIEVEMENTS OF DEEP LEARNING IN SPEECH RECOGNITION
DEEP LEARNING FOR NATURAL LANGUAGE AND MULTIMODAL PROCESSING
Findings
FUTURE WORK
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call