Abstract

Recognition of isolated spoken digits is the core procedure for a large and important number of applications mainly in telephone based services, such as dialing, airline reservation, bank transaction and price quotation, only using speech. Spoken digit recognition is generally a challenging task since the signals last for short period of time and often some digits are acoustically very similar to each other. The objective of this paper is to investigate the use of machine learning algorithms for digit recognition. We focus on the recognition of digits spoken in Portuguese. However, we note that our techniques are applicable to any language. We believe that the most important task for successfully recognizing spoken digits is the attribute extraction. Audio data is composed by a huge amount of very weak features, and most machine learning algorithms will not be able to build accurate classifiers. We show that Line Spectral Frequencies (LSF) provides a set of highly predictive coefficients for digit recognition. The results are superior than those obtained with state-of-the-art methods using Mel-Frequency Cepstrum Coefficients (MFCC) for digit recognition. In particular, we show that the choice of the right attribute extraction method is more important than the specific classification paradigm, and that the right combination of classifier and attributes can provide almost perfect accuracy.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.