Continuous Bangla Speech Processing: Segmentation, Classification and Recognition

Dr M M Rahman

doi:10.9734/bpi/mono/978-93-5547-173-4

Abstract

This study is concerned with the development of a continuous speech recognition system using Bangla Language, which includes speech word segmentation, feature extraction, speech word classification, and recognition. This research proposes four algorithms for dynamic thresholding to segment the continuous Bangla speech sentences into words/sub-words: (i) Algorithm-1 (based on modified k-means algorithm), (ii) Algorithm-2 (based on fuzzy µ-means algorithm), (iii) Algorithm-3 (based modified Otsu’s algorithm) and (iv) Algorithm-4 (short-time speech features based algorithm). This study also introduces a new approach, named the blocking black area method to identify the voiced regions of the continuous speech in speech segmentation. The segmented words are classified into different classes according to the number of syllables of the segmented words. For speech classification, this research introduces an efficient classification approach, named syllable-based classification. In features extraction, speech spectrogram features and short-time speech features have been analyzed. For speech feature generation, this research proposes three types of speech features: (i) short-time speech features, (ii) binary features, and (iii) MFCC features. The short-time speech features are used speech segmentation and binary features are used in both speech segmentation and recognition. In MFCC feature generation, various windowing functions have been applied. For speech recognition, a comprehensive study on neural networks and performance analysis with different improved and faster back-propagation (BP) algorithms (such as BP with momentum, variable learning rate BP, resilient BP, conjugate gradient BP, Levenberg-Marquardt BP algorithms) have been done. In this study, the Matlab Neural Network Toolbox 9.8.0 (R2020a) was used to create, train and simulate the feedforward neural network with the BP learning algorithm. The convergence obtained from the standard BP algorithm is very slow; that’s why this study proposes different improved and faster BP algorithms to solve the speech recognition problems. The developed system has been justified with continuously spoken several Bangla sentences. To test the performance of the system, 100 (one hundred) well-defined Bangla sentences have been recorded from 5 (five) male speakers of different ages and 656 words have been presented in the 100 Bangla sentences. So, the speech database contains 500 Bangla speech sentences with 3,280 speech words. The segmentation system has been achieved an average segmentation accuracy of 95.55% with Algorithm-1 (based on modified k-means algorithm), 96.19% with Algorithm-2 (based on fuzzy µ-means algorithm), 90.58% with Algorithm-3 (based on modified Otsu’s algorithm), and 95.9% with Algorithm-4 (short-time speech features based algorithm), respectively. The classification system has been achieved an average accuracy of 91.42%. The recognition system has been achieved a recognition rate of 83% using the resilient BP algorithm, 90% using the conjugate gradient BP algorithm, and 90% using the Levenberg-Marquardt BP algorithm, respectively, for recognizing segmented speech words.

Full Text