Abstract

Human-Machine Interaction (HMI) systems demand the use of multiple modalities for correct interaction. Research on these systems started with audio signals for speech recognition and now progressing towards co-operation of other biosignals. Thus, the paper presents an Automatic Speech Recognition (ASR) system based on a single and multiple modalities that include audio and Electroencephalogram (EEG) signals to explore speech recognition. It extracts speech information concealed in audio and ten channels of imagined EEG (EEG-i) & vocalized EEG (EEG-v) signals. Three Wavelet Transform (WT) methods - Discrete Wavelet Transform (DWT), Wavelet Packet Decomposition (WPD) & hybrid of DWT & WPD (DWPD) with four-level decomposition is used to transform the signals into WT coefficients. Then, six statistical parameters are computed from WT coefficients to generate 63 (26-1) feature vectors for each method. An exhaustive search from 63 feature vectors is conducted to determine the best parameter combination that attains good accuracy with ANN classifier. Then, accuracy is improved with 5 level decomposition on WPD coefficients along with the best parameter combination. Results include the accuracy of unimodal ASR & multimodal ASR. WPD method achieved best accuracy as 74.48%, 56.29%, 42.02%, 77.97% & 78.90% for multiclass classification of prompts+words based on audio, EEG-i, EEG-v, audio + EEG-i & audio + EEG-v respectively. It indicates that speech recognition is possible from EEG signals & the fusion of audio with EEG enhances the recognition rate of audio & EEG. The results also show that the proposed method outperforms other methods in the area.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call