HMM-based phonetic engine for continuous speech of a regional language

Rupinderdeep Kaur,R K Sharma,Parteek Kumar

doi:10.1142/s0217984919502956

Abstract

A Phonetic Engine (PE), phonetic-level speech recognition system for continuous speech of a regional language named Punjabi, has been proposed in this paper. Punjabi is a highly prosodic language and a very small amount of work has been done in this direction on this language. As a first step towards the development of PE, 25 hrs of data is collected in three different modes, namely, Read Speech, Conversational Speech and Lecture Speech. The 10 hrs of collected data is then manually transcribed using International Phonetic Alphabet (IPA) chart. The architecture of the PE includes three different phases: data preparation, system training and system testing. Initially, the vocabulary of 49 phones is chosen by carefully analyzing the symbol frequency in IPA transcription and data files are prepared to train the system accordingly. The prepared data files and speech files are then used for modeling and feature extraction. In the development of PE, Mel Frequency Cepstral Coefficient (MFCC) is used as a feature extraction technique and Hidden Markov Model (HMM) as a classifier. The PE is developed using HTK Toolkit. The performance of PE is evaluated using three different approaches: (i) by increasing the amount of data from 3 hrs to 5 hrs, (ii) by decreasing the number of symbols from 49 to 29 and (iii) by increasing MFCC dimensions from 12 to 36. An accuracy of 72.3% has been achieved in this work when 5 hrs data with 29 symbols and 12 MFCCs was employed.

Full Text