Abstract

In this paper, new set of visual speech feature based on Histogram of Oriented Gradient (HOG) is proposed to improve the robustness of bimodal Hindi speech recognition. For extracting the visual features, energy per block using HOG is calculated by finding the gradient magnitude of the pixel in the cell for both x and y direction form the region of interest (ROI). The advantage of proposed scheme is that it has reduced the dimensionality of visual features vectors which can retain the full information of the lip region. For comparative study, four sets of visual feature; Set A (Two-Dimensional Discrete Cosine Transform feature (2D-DCT)), Set B (Two-Dimensional Discrete Wavelet Transform followed by DCT (2D-DWT-DCT)), Set C (Static-HOG) and Set D (Dynamic-HOG) are extracted from AMUAV corpus. Standard Mel Frequency Cepstral coefficients (MFCC) followed by static and dynamic (MFCC) features were used as baseline features. The maximum improvement in WRA (%) of 12.73% is reported over baseline features using proposed features sets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call