Abstract

Considering personal privacy and difficulty of obtaining training material for many seldom used English words and (often non-English) names, language-independent (LI) with lightweight speaker-dependent (SD) automatic speech recognition (ASR) is a promising option to solve the problem. The dynamic time warping (DTW) algorithm is the state-of-the-art algorithm for small foot-print SD ASR applications with limited storage space and small vocabulary, such as voice dialing on mobile devices, menu-driven recognition, and voice control on vehicles and robotics. Even though we have successfully developed two fast and accurate DTW variations for clean speech data, speech recognition for adverse conditions is still a big challenge. In order to improve recognition accuracy in noisy environment and bad recording conditions such as too high or low volume, we introduce a novel one-against-all weighted DTW (OAWDTW). This method defines a one-against-all index (OAI) for each time frame of training data and applies the OAIs to the core DTW process. Given two speech signals, OAWDTW tunes their final alignment score by using OAI in the DTW process. Our method achieves better accuracies than DTW and merge-weighted DTW (MWDTW), as 6.97% relative reduction of error rate (RRER) compared with DTW and 15.91% RRER compared with MWDTW are observed in our extensive experiments on one representative SD dataset of four speakers' recordings. To the best of our knowledge, OAWDTW approach is the first weighted DTW specially designed for speech data in adverse conditions.

Highlights

  • This paper studies language-independent (LI) with light weight speaker-dependent (SD) automatic speech recognition (ASR) in adverse conditions, such as noisy environment and bad recording condition of too high or low volume

  • To test whether one-against-all weighted dynamic time warping (OAWDTW) is suitable for language independent (LI) speaker dependent (SD) automatic speech recognition (ASR), we need to have a multi-language speech corpus in which each word is recorded for at least two times – one as training data, the other as test data

  • We introduce a novel one-against-all weighted dynamic time warping (OAWDTW) to provide efficient automatic speech recognition service in noisy environment and bad recording conditions where the volume is too high or too low

Read more

Summary

Introduction

This paper studies language-independent (LI) with light weight speaker-dependent (SD) automatic speech recognition (ASR) in adverse conditions, such as noisy environment and bad recording condition of too high or low volume. ASR recognizes human speech using computer algorithms without the involvement of humans [3]. It is essentially a pattern recognition process. Because these applications should be used online and off-line, they can be developed as speakerdependent (SD) applications Many corporations, such as Google and Microsoft, have developed mature speaker-independent (SI) ASR applications. Most of the current applications are all language-dependent (LD) Such LD SI ASRs are based on Hidden Markov Model (HMM) [5], the accuracy of which relies on the amount of training data. Considering difficulty of obtaining training data for seldom used English words and (often non-English) names and personal privacy, LI with light-weighted SD ASR is a promising option to solve the problem

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call