Leveraging BERT to Improve Spoken Language Identification of Code-Switching Speech

Yuting Nie,Ziyue Qiu,Wei-Qiang Zhang,Junhong Zhao,Jinfeng Bai

doi:10.1142/s2717554524500036

Abstract

Language identification (LID) involves automatically determining the language being spoken in a given segment. In code-switching speech, there are rapid switches between two or more languages within a single conversation. Despite LID attaining high accuracy on medium or long utterances, the performance on short utterances of code-switching speech is currently unsatisfactory. We propose a bidirectional encoder representation from transformers (BERT)-based LID system (BERT-LID) to enhance LID performance, especially for short-duration code-switching speech segments. The original BERT model is expanded by incorporating phonetic posterior grams (PPGs) extracted from a front-end phone recognizer as input. Then it is followed by the deployment of an optimal deep classifier for LID. Our BERT-LID model demonstrates a significant improvement of approximately 6.5% in accuracy for long-segment identification and 19.9% for short-segment identification, thereby demonstrating its effectiveness in code-switching speech LID tasks.

Full Text