Abstract

Punjabi language is a tonal language belonging to an Indo-Aryan language family and has a number of speakers all around the world. Punjabi language has gained acceptability in the media & communication and therefore deserves to have a place in the growing field of automatic speech recognition which has been explored already for a number of other Indian and foreign languages successfully. Some work has been done in the field of isolated word speech recognition for Punjabi language, but only using whole word based acoustic models. A phone based approach has yet to be applied for Punjabi language speech recognition. This paper describes an automatic speech recognizer that recognizes isolated word speech and connected word speech using a triphone based acoustic model on the HTK 3.4.1 speech Engine and compares the performance with acoustic whole word model based ASR system. Word recognition accuracy of isolated word speech was 92.05% for acoustic whole word model based system and 97.14% for acoustic triphone model based system whereas word recognition accuracy of connected word speech was 87.75% for acoustic whole word model based system and 91.62% for acoustic triphone model based system.

Highlights

  • Speech is generated when vibrating vocal cords create puffs of air

  • The phone based acoustic model approach is new to the Punjabi language automatic speech recognition

  • This paper focuses on implementing an ASR for recognizing isolated word and connected word speech in the Punjabi Language

Read more

Summary

Introduction

Speech is generated when vibrating vocal cords create puffs of air. These puffs result in air pressure variations and it is due to these variations that the sensation of hearing develops. Automatic speech recognition [1, 18] is a process of transforming a speech signal (Figure 1) to a text which closely matches the input speech signal This technique is being used extensively in application areas such as: voice user interface, voice interactive response, enhancing social interactive capability of handicapped people, learning a foreign language etc. The acoustic model is used to represent the different ways a word of a particular language can sound. It makes use of audio recordings along with their transcriptions and compiles these two to produce statistical representations. The language model provides the context information to a speech recognition system It models the way the words are connected to form a sentence. The prior probability of the word, i.e. P (W), is provided by the language model, whereas the observation likelihood, i.e. P (X|W), is provided by the acoustic model

Acoustic Phone Model
Punjabi Language
Previous Work
Implementation
Phase 1
Phase 2
Findings
Conclusion & Future Work
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.