Abstract
Recognition of isolated spoken digits is the core procedure for a large number of applications which rely solely on speech for data exchange, as in telephone-based services, such as dialing, airline reservation, bank transaction and price quotation. Spoken digit recognition is generally a challenging task since the signals last for a short period of time and often some digits are acoustically very similar to other digits. The objective of this paper is to investigate the use of machine learning algorithms for spoken digit recognition and disclose the free availability of a database with digits pronounced in English and Portuguese to the scientific community. Since machine learning algorithms are fully dependent on predictive attributes to build precise classifiers, we believe that the most important task for successfully recognizing spoken digits is feature extraction. In this work, we show that Line Spectral Frequencies (LSF) provide a set of highly predictive coefficients. We evaluated our classifiers in different settings by altering the sampling rate to simulate low quality channels and varying the number of coefficients.
Highlights
In the last decades, research on speech and speaker recognition has attracted an enormous amount of attention, mainly due to the increasing number of applications such as biometric authentication, in which a user's voice is used to allow or deny access to a system; and accessibility, in which a user is able to control equipment or navigate the Internet using speech; facilitating these tasks to physically impaired people.An important speech recognition application, especially useful for telephone service providers, isActa Scientiarum
- We provide a wider set of experimental settings with different number of Mel-Frequency Cepstrum Coefficients (MFCC) and Line Spectral Frequencies (LSF) coefficients
Our results show that Line Spectral Frequencies (LSF) provide a set of highly predictive coefficients for digit recognition
Summary
Research on speech and speaker recognition has attracted an enormous amount of attention, mainly due to the increasing number of applications such as biometric authentication, in which a user's voice is used to allow or deny access to a system; and accessibility, in which a user is able to control equipment or navigate the Internet using speech; facilitating these tasks to physically impaired people.An important speech recognition application, especially useful for telephone service providers, isActa Scientiarum. Research on speech and speaker recognition has attracted an enormous amount of attention, mainly due to the increasing number of applications such as biometric authentication, in which a user's voice is used to allow or deny access to a system; and accessibility, in which a user is able to control equipment or navigate the Internet using speech; facilitating these tasks to physically impaired people. An important speech recognition application, especially useful for telephone service providers, is. Companies make their services user-friendlier compared with entering numbers on the telephone keypad. This is even more evident when the procedure is done through mobile devices, in which there are no physically detached keyboards for dialing
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.