Abstract
Approaches to automatic speech recognition have transited from Hidden Markov Model (HMM)-based ASR to deep neural networks. The advantages of deep neural network approaches are that they can be developed quickly and perform better given large language resources. Nevertheless, dialect speech recognition is still challenging due to the limited resources. Transfer learning approaches have been proposed to improve speech recognition for low resources. In the first approach, the model is pre-trained on a large and diverse labeled dataset to learn the acoustic and language patterns from the speech signal. Then, the model parameters are updated with a new dataset, and the pre-trained model is fine-tuned on a low-resource language dataset. The fine-tuning process is usually completed by freezing the pre-trained layers and training the remaining layers of the model on the low-resource language corpus. Another approach is to use a pre-trained model to capture the compact and meaningful features as input to the encoder. Pre-training in this approach usually involves using unsupervised learning methods to train models on a corpus of large amounts of unmarked data. It enables the model to learn the general patterns and relationships between the input speech signals. This paper proposes a training recipe using transfer learning and Standard Malay models to improve automatic speech recognition for Kelantan and Sarawak Malay dialects.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.