Abstract
In this study, we introduce a new code-switched speech database with 57h of Manipuri–English annotated spontaneous speech. Manipuri is an official language of India and is primarily spoken in the north–eastern Indian state of Manipur. Most Manipur native speakers today are bilingual and frequently use code switching in everyday discussions. By carefully assessing the amount of code-switched speech in each video, recordings from YouTube are gathered. 21,339 utterances and 291,731 instances of code switching are present in the database. Given the code-switching nature of the data, a proper annotation procedure is used, and the data are manually annotated using the Meitei Mayek unicode font and the roman alphabets for Manipuri and English, respectively. The transcription includes the information of the speakers, non-speech information, and the corresponding annotation. The aim of this research is to construct an automatic speech recognition (ASR) system as well as offer a thorough analysis and details of the speech corpus. We believe that our research is the first to use an ASR system for Manipuri–English code-switched speech. To evaluate the performance, ASR systems based on hybrid deep neural network and hidden Markov model (DNN–HMM), time delay neural network (TDNN), hybrid time delay neural network and long short-term memory (TDNN–LSTM) and three end-to-end (E2E) models i.e. hybrid connectionist temporal classification and attention model (CTC-Attention), Conformer, wav2vec XLSR are developed for Manipuri–English language. In comparison to other models, pure TDNN produces outcomes that are clearly superior.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have