An efficient speech recognition system for arm‐disabled students based on isolated words

Khalid A Darabkh,Ramzi Saifan,Sharhabeel H Alnabelsi,Saadeh Z Sweidan,Mohammed Hawa,Laila Haddad

doi:10.1002/cae.21884

Abstract

AbstractOver the previous decades, a need has emerged to empower human‐machine communication systems, which are essential to not only perform actions, but also obtain information especially in education applications. Moreover, any communication system has to introduce an efficient and easy way for interaction with a minimum possible error rate. The keyboard, mouse, trackball, touch‐screen, and joystick are all examples of tools which were built to provide mechanical human‐to‐machine interaction. However, a system with the ability to use oral speech, which is the natural form of communication between humans instead of mechanical communication systems, can be more practical for normal students and even a necessity for arm‐disabled students who cannot use their arms to handle traditional education tools like pens and notebooks. In this paper, we present a speech recognition system that allows arm‐disabled students to control computers by voice as a helping tool in the educational process. When a student speaks through a microphone, the speech is divided into isolated words which are compared with a predefined database of huge number of spoken words to find a match. After that, each recognized word is translated into its related tasks which will be performed by the computer like opening a teaching application or renaming a file. The speech recognition process discussed in this paper involves two separate approaches; the first approach is based on double thresholds voice activity detection and improved Mel‐frequency cepstral coefficients (MFCC), while the second approach is based on discrete wavelet transform along with modified MFCC algorithm. Utilizing the best values for all parameters in just mentioned techniques, our proposed system achieved a recognition rate of 98.7% using the first approach, and 98.86% using the second approach of which is better in ratio than the first one but slower in processing which is a critical point for a real time system. Both proposed approaches were compared with other relevant approaches and their recognition rates were noticeably higher.

Full Text