Spoken Digit Recognition in Portuguese Using Line Spectral Frequencies

Diego F. Silva,Vinícius M. A. de Souza,Gustavo E. A. P. A. Batista,Rafael Giusti

doi:10.1007/978-3-642-34654-5_25

Abstract

Recognition of isolated spoken digits is the core procedure for a large and important number of applications mainly in telephone based services, such as dialing, airline reservation, bank transaction and price quotation, only using speech. Spoken digit recognition is generally a challenging task since the signals last for short period of time and often some digits are acoustically very similar to each other. The objective of this paper is to investigate the use of machine learning algorithms for digit recognition. We focus on the recognition of digits spoken in Portuguese. However, we note that our techniques are applicable to any language. We believe that the most important task for successfully recognizing spoken digits is the attribute extraction. Audio data is composed by a huge amount of very weak features, and most machine learning algorithms will not be able to build accurate classifiers. We show that Line Spectral Frequencies (LSF) provides a set of highly predictive coefficients for digit recognition. The results are superior than those obtained with state-of-the-art methods using Mel-Frequency Cepstrum Coefficients (MFCC) for digit recognition. In particular, we show that the choice of the right attribute extraction method is more important than the specific classification paradigm, and that the right combination of classifier and attributes can provide almost perfect accuracy.

Full Text