Abstract

Best-Tree Encoding (BTE) is first introduced by Amr M. Gody [1] as new features for Automatic Speech Recognition (ASR) problem. BTE is basically acting as spectrum analyzer. It relies on Wavelet packets to get projection of signal power into predefined filter banks. The feature components are encoded into digital form using certain entropy method and certain digital encoding procedure. In this research BTE is further developed by including two more key factors into the BTE process. The key factors are Mel-scale (MS) and baseband Bandwidth mapping (BM).This Research provides a baseline performance evaluation for Context-independent mono-phone recognition (Without Grammar) of English by using Vid-TIMIT database. Vid-TIMIT consists of 43 speakers (19 female and 24 male), reciting short sentences. The recording of this database was done in a noisy environment (mostly computer fan noise) and also it is not hand verified. Total of 15643 phone segments are used for testing and evaluating the newly proposed features. HMM is used as recognition engine via HTK toolkit for its popularity in ASR. Comparison to MFCC on the same database is considered to evaluate the system results. Although it gives the same recognition efficiency as MFCC on the same testing database, the proposed model saves almost 66% of the required storage than the feature vector of MFCC.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.