MECOS: A bilingual Manipuri–English spontaneous code-switching speech corpus for automatic speech recognition

Naorem Karline Singh,Yambem Jina Chanu,Hoomexsun Pangsatabam

doi:10.1016/j.csl.2024.101627

Naorem Karline Singh, Yambem Jina Chanu + Show 1 more

https://doi.org/10.1016/j.csl.2024.101627

Copy DOI

Export

Save

Cite

Journal: Computer Speech & Language	Publication Date: Feb 20, 2024
Citations: 3

Abstract
Full-Text
Similar Papers

Abstract

Listen

In this study, we introduce a new code-switched speech database with 57h of Manipuri–English annotated spontaneous speech. Manipuri is an official language of India and is primarily spoken in the north–eastern Indian state of Manipur. Most Manipur native speakers today are bilingual and frequently use code switching in everyday discussions. By carefully assessing the amount of code-switched speech in each video, recordings from YouTube are gathered. 21,339 utterances and 291,731 instances of code switching are present in the database. Given the code-switching nature of the data, a proper annotation procedure is used, and the data are manually annotated using the Meitei Mayek unicode font and the roman alphabets for Manipuri and English, respectively. The transcription includes the information of the speakers, non-speech information, and the corresponding annotation. The aim of this research is to construct an automatic speech recognition (ASR) system as well as offer a thorough analysis and details of the speech corpus. We believe that our research is the first to use an ASR system for Manipuri–English code-switched speech. To evaluate the performance, ASR systems based on hybrid deep neural network and hidden Markov model (DNN–HMM), time delay neural network (TDNN), hybrid time delay neural network and long short-term memory (TDNN–LSTM) and three end-to-end (E2E) models i.e. hybrid connectionist temporal classification and attention model (CTC-Attention), Conformer, wav2vec XLSR are developed for Manipuri–English language. In comparison to other models, pure TDNN produces outcomes that are clearly superior.

Full Text