Supplementary features for improving phone recognition

Mridul Balaraman,Sorin Dusan,James L Flanagan

doi:10.1121/1.4784901

Abstract

Traditional speech recognition systems use mel-frequency cepstral coefficients (MFCCs) as acoustic features. The present research aims to study the classification characteristics and the performance of some supplementary features (SFs) such as periodicity, zero crossing rate, log energy and ratio of low frequency energy to total energy, in a phone recognition system, built using the Hidden Markov Model Tool Kit. To demonstrate the performance of the SFs, training is done on a subset of the TIMIT data base (DR1 data set) on context independent phones using a single mixture. When only the SFs and their first derivatives (feature set of dimension 8) are used the recognition accuracy is found to be 42.96% as compared to 54.65% when 12 MFCCs and their corresponding derivatives are used. The performance of the system improves to 56.49%, when the SFs and their derivatives are used along with the MFCCs. A further improvement to 60.34% is observed when the last 4 MFCCs and their derivatives are replaced by SFs and their derivatives, respectively. These results indicate that the supplementary features contain classification characteristics which can be useful in automatic speech recognition.

Full Text