Abstract
The adoption of high-accuracy speech recognition algorithms without an effective evaluation of their impact on the target computational resource is impractical for mobile and embedded systems. In this paper, techniques are adopted to minimise the required computational resource for an effective mobile-based speech recognition system. A Dynamic Multi-Layer Perceptron speech recognition technique, capable of running in real time on a state-of-the-art mobile device, has been introduced. Even though a conventional hidden Markov model when applied to the same dataset slightly outperformed our approach, its processing time is much higher. The Dynamic Multi-layer Perceptron presented here has an accuracy level of 96.94% and runs significantly faster than similar techniques.
Highlights
IntroductionThe predominant application of this technology can be found as Siri on the Apple IPhones, Cortana on Microsoft devices and Google talk on android devices
Mobile speech recognition has become an everyday phenomenon
The results presented here are for the TIDIGITS database because it is the only database without an overlap of speakers in the training and test files. These results show a better performance for the dynamic approach and, as such, further tests were carried out using the Dynamic Multi-layer Perceptron (MLP) only
Summary
The predominant application of this technology can be found as Siri on the Apple IPhones, Cortana on Microsoft devices and Google talk on android devices All three of these systems leverage a client-server approach to achieve recognition because the speech recognition process is computationally intensive. MLPs are by nature, feedforward networks where the direction of the connections of the respective units moves in one direction, from the input to the output, with no connections flowing backwards As such, this renders MLPs to be static classifiers, which reflects how they are applied to speech recognition tasks. This paper does not present a general comparison of HMM and NN techniques for speech recognition but instead compares the adaptation of these technologies for on-device mobile use or for embedded systems (with little or no internet connection). The baseline HMM used as a comparison was tuned for on-device embedded use by adopting positive integer calculations with fixed point notation
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.