CodeFed: Federated Speech Recognition for Low-Resource Code-Switching Detection

Chetan Madan,Harshita Diddee,Deepika Kumar,Mamta Mittal

doi:10.1145/3571732

Abstract

One common constraint in the practical application of speech recognition is Code Switching. The issue of code-switched languages is especially aggravated in the context of Indian languages – since most massively multilingual models are trained on corpora that are not representative of the diverse set of Indian languages. An associated constraint with such systems is the privacy-intrusive nature of the applications that aim to collate such representative data. To collectively mitigate both problems, this work presents CodeFed: A federated learning-based code-switching detection model that can be deployed to collaboratively be trained by leveraging private data from multiple users, without compromising their privacy. Using a representative low-resource Indic dataset, we demonstrate the superior performance of a collaboratively trained global model that is trained using federated learning on three low-resource Indic languages – Gujarati, Tamil and Telugu and draw a comparison of the model with respect to the most current work in the field. Finally, to evaluate the practical realizability of the proposed system, CodeFed also discusses the system overview of the label generation architecture which may accompany CodeFed’s possible real-time deployment.

Full Text