Abstract

Language identification (LID) involves automatically determining the language being spoken in a given segment. In code-switching speech, there are rapid switches between two or more languages within a single conversation. Despite LID attaining high accuracy on medium or long utterances, the performance on short utterances of code-switching speech is currently unsatisfactory. We propose a bidirectional encoder representation from transformers (BERT)-based LID system (BERT-LID) to enhance LID performance, especially for short-duration code-switching speech segments. The original BERT model is expanded by incorporating phonetic posterior grams (PPGs) extracted from a front-end phone recognizer as input. Then it is followed by the deployment of an optimal deep classifier for LID. Our BERT-LID model demonstrates a significant improvement of approximately 6.5% in accuracy for long-segment identification and 19.9% for short-segment identification, thereby demonstrating its effectiveness in code-switching speech LID tasks.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call