Prosodically guided phonetic engine

G Deekshitha,Leena Mary

doi:10.1109/spices.2015.7091457

Abstract

Phonetic Engine (PE) is the first stage of automatic speech recognition system that converts input speech to a sequence of phonetic symbols. A baseline phonetic engine is created using Malayalam speech database. A Graphical User Interface (GUI) is developed for the phonetic engine to perform real time recognition of phonemes. It is known that higher level of speech information such as intonation, duration and intensity collectively referred as ‘prosody’, aids human speech recognition. Prosody helps to segment speech to sentences/phrases and to disambiguate recognition process. This has motivated us to incorporate prosody for the improvement of the baseline phonetic engine. However incorporating prosody in automatic speech recognition is a challenging task. This paper describes an approach to automatic labeling of prosodic events and discusses about the possibility to implement a prosodically guided phonetic engine for Malayalam. Automatic phrase-like segmentation is realized by detecting long pauses with an Artificial Neural Network (ANN) based classifier. Broad Phoneme Classification is achieved using features derived from the speech at the signal level itself. Combination of broad phoneme transcription and pitch trend labels is used to obtain a temporal prosodic pattern. We have illustrated the effectiveness of this temporal prosodic pattern is for audio search application.

Full Text