Using Mel-Mapped Best Tree Encoding for Baseline-Context-Independent-Mono-Phone Automatic Speech Recognition

Amr Gody,Mai Ezz El-Din,Rania Abul Seoud

doi:10.21608/ejle.2015.60254

Amr Gody, Mai Ezz El-Din + Show 1 more

Open Access

https://doi.org/10.21608/ejle.2015.60254

Copy DOI

Abstract

Best-Tree Encoding (BTE) is first introduced by Amr M. Gody [1] as new features for Automatic Speech Recognition (ASR) problem. BTE is basically acting as spectrum analyzer. It relies on Wavelet packets to get projection of signal power into predefined filter banks. The feature components are encoded into digital form using certain entropy method and certain digital encoding procedure. In this research BTE is further developed by including two more key factors into the BTE process. The key factors are Mel-scale (MS) and baseband Bandwidth mapping (BM).This Research provides a baseline performance evaluation for Context-independent mono-phone recognition (Without Grammar) of English by using Vid-TIMIT database. Vid-TIMIT consists of 43 speakers (19 female and 24 male), reciting short sentences. The recording of this database was done in a noisy environment (mostly computer fan noise) and also it is not hand verified. Total of 15643 phone segments are used for testing and evaluating the newly proposed features. HMM is used as recognition engine via HTK toolkit for its popularity in ASR. Comparison to MFCC on the same database is considered to evaluate the system results. Although it gives the same recognition efficiency as MFCC on the same testing database, the proposed model saves almost 66% of the required storage than the feature vector of MFCC.

Full Text