Abstract
Many automatic speech recognition (ASR) applications may have noisy background conditions; hence, robustness has become an important area of research. The automated recognition of human speech using features from the visual domain along with audio has proved to be useful under these conditions. In this paper, use of visual information is proposed to increase the recognition performance and robustness of Hindi viseme recognition system. A database has been prepared comprising of ten Hindi sentences uttered by five different speakers. The audio feature based on mel frequency cepstral coefficient (MFCC) has been extracted and subspace-based discrete cosine transform (DCT) was applied to extract visual features. The video-based features were integrated with audio features before using a discriminant function-based classifier for five Hindi viseme classes. Integration of visual features gave an improvement in viseme recognition in case of clean as well as noisy speech. Maximum improvement of 6.67% in accuracy ...
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have