Abstract

Stuttering is a neuro-development disorder during which normal speech flow is not fluent. Traditionally Speech-Language Pathologists used to assess the extent of stuttering by counting the speech disfluencies manually. Such sorts of stuttering assessments are arbitrary, incoherent, lengthy, and error-prone. The present study focused on objective assessment to speech disfluencies such as prolongation and syllable, word, and phrase repetition. The proposed method is based on the Weighted Mel Frequency Cepstral Coefficient feature extraction algorithm and deep-learning Bidirectional Long-Short term Memory neural network for classification of stuttered events. The work has utilized the UCLASS stuttering dataset for analysis. The speech samples of the database are initially pre-processed, manually segmented, and labeled as a type of disfluency. The labeled speech samples are parameterized to Weighted MFCC feature vectors. Then extracted features are inputted to the Bidirectional-LSTM network for training and testing of the model. The effect of different hyper-parameters on classification results is examined. The test results show that the proposed method reaches the best accuracy of 96.67%, as compared to the LSTM model. The promising recognition accuracy of 97.33%, 98.67%, 97.5%, 97.19%, and 97.67% was achieved for the detection of fluent, prolongation, syllable, word, and phrase repetition, respectively.

Highlights

  • For communication between human beings, speech proves to be the most habitually and widely used verbal means to precise feelings, ideas, and thought

  • This section discusses the efficacy and performance of the proposed algorithm based on Weighted Mel Frequency Cepstral Coefficients (WMFCC) feature extraction and Bidirectional Long-Short Term Memory (Bi-long-short term memory (LSTM)) classification for four forms of disfluencies

  • 14dimensional acoustic features were extracted from the www.ijacsa.thesai.org segmented samples using the WMFCC feature extraction algorithm

Read more

Summary

INTRODUCTION

For communication between human beings, speech proves to be the most habitually and widely used verbal means to precise feelings, ideas, and thought. SLPs used to assess the extent of stuttering manually They counted and divided the frequency of stuttered events with total spoken words. The proposed work has employed Weighted Mel Frequency Cepstral Coefficients (WMFCC) feature extraction method and deep-learning-based classification method Bidirectional Long-Short Term Memory (Bi-LSTM) for the automatic assessment of four forms of disfluency prolongation and syllable, word, and phrase repetition. WMFCC includes the dynamic information of the speech samples, which increases the detection accuracy of stuttered events; and reduces the computational overhead to the classification stage. It employs Bi-LSTM rather than traditional RNN and LSTM.

RELATED WORKS
Result
CONSTRUCTION OF MODEL
Signal Pre-Processing
Disfluent Speech Sample Segmentation and Labeling
Labeled Samples Splitting
WMFCC Feature Extraction
Bi-Directional Long-Short Term Memory
Bi-LSTM Model Training and Testing
EXPERIMENTS AND RESULTS
Adjustments of Parameters
Analysis of Experimental Results
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call