Integration of Phonotactic Features for Language Identification on Code-Switched Speech

Koena Mabokela

doi:10.5121/ijnlc.2022.11102

Abstract

In this paper, phoneme sequences are used as language information to perform code-switched language identification (LID). With the one-pass recognition system, the spoken sounds are converted into phonetically arranged sequences of sounds. The acoustic models are robust enough to handle multiple languages when emulating multiple hidden Markov models (HMMs). To determine the phoneme similarity among our target languages, we reported two methods of phoneme mapping. Statistical phoneme-based bigram language models (LM) are integrated into speech decoding to eliminate possible phone mismatches. The supervised support vector machine (SVM) is used to learn to recognize the phonetic information of mixed-language speech based on recognized phone sequences. As the back-end decision is taken by an SVM, the likelihood scores of segments with monolingual phone occurrence are used to classify language identity. The speech corpus was tested on Sepedi and English languages that are often mixed. Our system is evaluated by measuring both the ASR performance and the LID performance separately. The systems have obtained a promising ASR accuracy with data-driven phone merging approach modelled using 16 Gaussian mixtures per state. In code-switched speech and monolingual speech segments respectively, the proposed systems achieved an acceptable ASR and LID accuracy.

Highlights

Even though we report only on experiments that were conducted in South Africa in two official languages, we believe that the same process can be applied to other underresourced languages as well [23]
Each hidden Markov models (HMMs) state distribution is modelled by 8-Gaussian mixture models (GMM) with a diagonal covariance matrix
This paper presents an incorporation of phonotactic information to perform multilingual ASRLID on mixed-language speech

Summary

Introduction

It is common for multilingual speakers to engage in code-switching or switching between more than one language in an utterance, a phenomenon known as mixed-language usage [1]. In multilingual societies, it seems to be commonly preferred. The result is that South Africa is a multilingual nation, with eleven official languages. South African languages are represented by a mixed mode of usage (e.g., in radio and television dramas, news broadcasts, religious worship services, and interviews and presentations). Native speakers of African languages utilize the English language to express numerical digits, times, and codes. In South Africa, it is common to hear more than one language being spoken in the same area.

Objectives

Results

Conclusion