Block Energy Based Visual Features Using Histogram Of Oriented Gradient For Bimodal Hindi Speech Recognition

Prashant Upadhyaya,Omar Farooq,M.R Abidi

doi:10.1016/j.procs.2018.05.066

Prashant Upadhyaya, Omar Farooq + Show 1 more

Open Access

https://doi.org/10.1016/j.procs.2018.05.066

Copy DOI

Journal: Procedia computer science	Publication Date: Jan 1, 2018
Citations: 1	License type: cc-by-nc-nd

Affiliation: Aligarh Muslim University

Abstract

In this paper, new set of visual speech feature based on Histogram of Oriented Gradient (HOG) is proposed to improve the robustness of bimodal Hindi speech recognition. For extracting the visual features, energy per block using HOG is calculated by finding the gradient magnitude of the pixel in the cell for both x and y direction form the region of interest (ROI). The advantage of proposed scheme is that it has reduced the dimensionality of visual features vectors which can retain the full information of the lip region. For comparative study, four sets of visual feature; Set A (Two-Dimensional Discrete Cosine Transform feature (2D-DCT)), Set B (Two-Dimensional Discrete Wavelet Transform followed by DCT (2D-DWT-DCT)), Set C (Static-HOG) and Set D (Dynamic-HOG) are extracted from AMUAV corpus. Standard Mel Frequency Cepstral coefficients (MFCC) followed by static and dynamic (MFCC) features were used as baseline features. The maximum improvement in WRA (%) of 12.73% is reported over baseline features using proposed features sets.

Full Text