Abstract

Phonetic Engine (PE) is the first stage of automatic speech recognition system that converts input speech to a sequence of phonetic symbols. A baseline phonetic engine is created using Malayalam speech database. A Graphical User Interface (GUI) is developed for the phonetic engine to perform real time recognition of phonemes. It is known that higher level of speech information such as intonation, duration and intensity collectively referred as ‘prosody’, aids human speech recognition. Prosody helps to segment speech to sentences/phrases and to disambiguate recognition process. This has motivated us to incorporate prosody for the improvement of the baseline phonetic engine. However incorporating prosody in automatic speech recognition is a challenging task. This paper describes an approach to automatic labeling of prosodic events and discusses about the possibility to implement a prosodically guided phonetic engine for Malayalam. Automatic phrase-like segmentation is realized by detecting long pauses with an Artificial Neural Network (ANN) based classifier. Broad Phoneme Classification is achieved using features derived from the speech at the signal level itself. Combination of broad phoneme transcription and pitch trend labels is used to obtain a temporal prosodic pattern. We have illustrated the effectiveness of this temporal prosodic pattern is for audio search application.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.