Code-switched end-to-end Marathi speech recognition for especially abled people

Praveen Hore,Amit Sharma

doi:10.1080/09720529.2021.2014134

Abstract

There is no need for eyes or arms in speech communication. Instead, both humans and machines communicate by speaking and listening. Therefore, speech communication not only helps physically disabled but also help people with hearing loss and/or language disorders. It is also used by people with vision or motor disabilities. There are various types of devices that are used for people with hearing loss. They can be classified as audio-visual stimulation devices, tactile devices, and speech processing devices. These devices consider the characteristics of speech and/or deploy automatic speech recognition inside it. This article focuses on such ASR system based on E2E model which can be useful in such devices. This research is carried out on the language used by the families of India’s over 80 million speakers i.e., Marathi. These assessments provide valuable insights into the development of SR models for low-resourced Indian languages. We present E2E ASR model trained with code-switched text of Marathi language and decoded similarly to a normal E2E ASR system. This model achieves a better word error rate of 35.09%, according to experimental results on the code-switched Marathi corpus. We compared our model to Facebook’s wav2vec2 model, which produces a WER of 23.72 percent on code-switched Marathi datasets without any language model and 12.10 percent on open ASR Marathi datasets.

Full Text