Abstract

In this paper, we propose a new method for code-switching (CS) automatic speech recognition (ASR) in Korean. First, the phonetic variations in English pronunciation spoken by Korean speakers should be considered. Thus, we tried to find a unified pronunciation model based on phonetic knowledge and deep learning. Second, we extracted the CS sentences semantically similar to the target domain and then applied the language model (LM) adaptation to solve the biased modeling toward Korean due to the imbalanced training data. In this experiment, training data were AI Hub (1033 h) in Korean and Librispeech (960 h) in English. As a result, when compared to the baseline, the proposed method improved the error reduction rate (ERR) by up to 11.6% with phonetic variant modeling and by 17.3% when semantically similar sentences were applied to the LM adaptation. If we considered only English words, the word correction rate improved up to 24.2% compared to that of the baseline. The proposed method seems to be very effective in CS speech recognition.

Highlights

  • Automatic speech recognition (ASR) and speech translation (ST) based on end-to-end (E2E) frameworks have shown significant improvements

  • In the case of Korean, English words pronounced by Korean speakers—Korean-style English (i.e., Konglish)—have many phonetic variations from native-like English pronunciation

  • To simultaneously avoid the data imbalance and low resources of CS, in this paper, we propose a hybrid method based on phonetic knowledge and deep learning, which integrates Korean and English data

Read more

Summary

Introduction

Automatic speech recognition (ASR) and speech translation (ST) based on end-to-end (E2E) frameworks have shown significant improvements. These systems have been widely adapted to real-life situations, such as lectures, business meetings, and human– machine conversations. To figure out the effect of CS, we investigated how often Korean sentences have English words. These problems can be categorized into two types: the inter-sentential, where language transitions occur at the phrase, sentence, or discourse boundaries; and the.

Related Work
Phonetic Variant Modeling
Phoneme Mapping Using Phonetic Knowledge
Applying the Korean–Konglish Mixed Model
Applying Domain Adaptation Using Shallow Fusion
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call