Abstract
One common constraint in the practical application of speech recognition is Code Switching. The issue of code-switched languages is especially aggravated in the context of Indian languages – since most massively multilingual models are trained on corpora that are not representative of the diverse set of Indian languages. An associated constraint with such systems is the privacy-intrusive nature of the applications that aim to collate such representative data. To collectively mitigate both problems, this work presents CodeFed: A federated learning-based code-switching detection model that can be deployed to collaboratively be trained by leveraging private data from multiple users, without compromising their privacy. Using a representative low-resource Indic dataset, we demonstrate the superior performance of a collaboratively trained global model that is trained using federated learning on three low-resource Indic languages – Gujarati, Tamil and Telugu and draw a comparison of the model with respect to the most current work in the field. Finally, to evaluate the practical realizability of the proposed system, CodeFed also discusses the system overview of the label generation architecture which may accompany CodeFed’s possible real-time deployment.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: ACM Transactions on Asian and Low-Resource Language Information Processing
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.