Personalized speech recognition on mobile devices

Ian Mcgraw,Francoise Beaufays,Kanishka Rao,Alexander Gruenstein,Hasim Sak,Rohit Prabhavalkar,Carolina Parada,David Rybach,Raziel Alvarez,Montse Gonzalez Arenas,Ouais Alsharif

doi:10.1109/icassp.2016.7472820

Abstract

We describe a large vocabulary speech recognition system that is accurate, has low latency, and yet has a small enough memory and computational footprint to run faster than real-time on a Nexus 5 Android smartphone. We employ a quantized Long Short-Term Memory (LSTM) acoustic model trained with connectionist temporal classification (CTC) to directly predict phoneme targets, and further reduce its memory footprint using an SVD-based compression scheme. Additionally, we minimize our memory footprint by using a single language model for both dictation and voice command domains, constructed using Bayesian interpolation. Finally, in order to properly handle device-specific information, such as proper names and other context-dependent information, we inject vocabulary items into the decoder graph and bias the language model on-the-fly. Our system achieves 13.5% word error rate on an open-ended dictation task, running with a median speed that is seven times faster than real-time.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Personalized speech recognition on mobile devices

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Transliteration Based Approaches to Improve Code-Switched Speech Recognition Performance
Jesse Emond ... Min Ma
-
Jesse Emond, et. al.Jesse Emond ... Min Ma
01 Dec 2018
01 Dec 2018

A Language Model Optimization Method for Turkish Automatic Speech Recognition System
Saadin Oyucu ... Hüseyin Polat
Politeknik Dergisi | VOL. 26
Saadin Oyucu, et. al.Saadin Oyucu ... Hüseyin Polat
01 Oct 2023
Politeknik Dergisi | VOL. 26

Central Kurdish Automatic Speech Recognition using Deep Learning
Abdulhady Abdullah ... Hadi Veisi
Journal of University of Anbar for Pure Science | VOL. 16
Abdulhady Abdullah, et. al.Abdulhady Abdullah ... Hadi Veisi
01 Dec 2022
Journal of University of Anbar for Pure Science | VOL. 16

Future vector enhanced LSTM language model for LVCSR
Qi Liu ... Yanmin Qian
-
Qi Liu, et. al.Qi Liu ... Yanmin Qian
01 Dec 2017
01 Dec 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Personalized speech recognition on mobile devices

Abstract

Talk to us

Similar Papers