The goal of this research is to develop voice‐controlled wheelchairs that can be operated by inarticulate speech affected by severe cerebral palsy or quadriplegia, for instance. In this case, principal factors obstructing recognition performance are significant pronunciation variation caused by difficulty in stable articulation and bad influences of variety of noise in the real environment. To cope with the pronunciation variations, pronunciation lexicons that consist of multitemplates of reference patterns are utilized. The reference patterns are represented with subphonetic codes, which can describe variations of inarticulate speech more precisely than ordinary phonetic transcriptions. Pronunciation lexicons are generated by generalizing coded samples into compact representation of templates based on DP and data mining. For noise robustness, a voice activity detection method is investigated in order to circumvent friction of microphone, cough, etc. A sound source localization using a microphone array is also integrated in order to reject sounds from outside of the wheelchair. These methods are integrated into a system that can be mounted on a wheelchair. A usability test operated by a severe cerebral palsy patient in the real environment results in 95.8% accuracy within 1404 samples for the recognition of the five‐command set. [Work supported by MEXT.]
Read full abstract