Especially in low-resource languages like Hindi and Marathi, code-switching presents major difficulties for automatic speech recognition (ASR). This work provides a 450-hour annotated dataset of Hindi-Marathi code-switching, including tag-switching, intra-sentential, and inter-sentential patterns. We augment a transformer-based ASR architecture with dynamic switching algorithms using wav2vec2 and a reinforcement learning-based approach known as Q-Learning, thereby dynamically optimizing language transition points.With a Word Error Rate (WER) of 0.2800 and a Character Error Rate (CER) of 0.2400, the proposed model beats conventional HMM-GMM and RNN-based ASR systems. Combining reinforcement learning for dynamic code-switching with transformer-based self-supervised learning demonstrates enhanced accuracy and flexibility.Comparative analysis shows the improvements relative to heuristic methods, Kaldi baselines, and pre-trained monolingual models. This work underscores the significance of hybrid architectures, dynamic algorithms, and sophisticated acoustic modeling in code-switched speech recognition, thereby offering a comprehensive framework for multilingual automatic speech recognition. The results have a major impact on the evolution of ASR in linguistically diverse and economically constrained environments.
Read full abstract