Abstract

The adoption of high-accuracy speech recognition algorithms without an effective evaluation of their impact on the target computational resource is impractical for mobile and embedded systems. In this paper, techniques are adopted to minimise the required computational resource for an effective mobile-based speech recognition system. A Dynamic Multi-Layer Perceptron speech recognition technique, capable of running in real time on a state-of-the-art mobile device, has been introduced. Even though a conventional hidden Markov model when applied to the same dataset slightly outperformed our approach, its processing time is much higher. The Dynamic Multi-layer Perceptron presented here has an accuracy level of 96.94% and runs significantly faster than similar techniques.

Highlights

  • IntroductionThe predominant application of this technology can be found as Siri on the Apple IPhones, Cortana on Microsoft devices and Google talk on android devices

  • Mobile speech recognition has become an everyday phenomenon

  • The results presented here are for the TIDIGITS database because it is the only database without an overlap of speakers in the training and test files. These results show a better performance for the dynamic approach and, as such, further tests were carried out using the Dynamic Multi-layer Perceptron (MLP) only

Read more

Summary

Introduction

The predominant application of this technology can be found as Siri on the Apple IPhones, Cortana on Microsoft devices and Google talk on android devices All three of these systems leverage a client-server approach to achieve recognition because the speech recognition process is computationally intensive. MLPs are by nature, feedforward networks where the direction of the connections of the respective units moves in one direction, from the input to the output, with no connections flowing backwards As such, this renders MLPs to be static classifiers, which reflects how they are applied to speech recognition tasks. This paper does not present a general comparison of HMM and NN techniques for speech recognition but instead compares the adaptation of these technologies for on-device mobile use or for embedded systems (with little or no internet connection). The baseline HMM used as a comparison was tuned for on-device embedded use by adopting positive integer calculations with fixed point notation

Feature extraction
Experimentation data
Automatic speech recognition classifiers
Proposed Dynamic Multi-layer Perceptron implementation
Standard Multi-layer perceptron
Dynamic MLP experimentation
Hidden Markov model implementation
Dynamic Multi-layer Perceptron results
HMM results
Memory and time comparison between DMLP and HMM
GB 2 GB 2 GB 2 GB 3 GB
Deep neural networks
Conclusion
Compliance with ethical standards
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.