Abstract

Speech plays a vital role in communication, from expressing oneself, to utilizing speech-based platforms, speech is a necessity. Any disruption in speech is referred to as disfluency, and can impact one's quality of life. This paper presents an experimental study on various techniques for the detection and classification of speech disfluencies. Six different types of disfluencies are examined in this paper, namely Interjection, Sound Repetition, Word Repetition, Phrase Repetition, Revision and Prolongation (6 classes). However, this paper also goes a step further by including the clean speech signals as an added class alongside the six disfluencies, thereby making this work more robust with 7 classes. Various machine learning approaches have been investigated on the University College London Archive of Stuttered Speech (UCLASS) dataset; a standard disfluency dataset generated by University College London (UCL). Five different feature extraction techniques viz. Mel Frequency Cepstral Coefficients (MFCC), Linear Predictive Cepstral Coefficients (LPCC), Gammatone Frequency Cepstral Coefficients (GFCC), Mel-filterbank energy features, and Spectrograms have been used. Comparative analysis of various classifiers shows that MFCC, GFCC, and Spectrograms achieved greater than 90% accuracy on both 6 and 7 classes with the kNN classifier. As a future scope to this study, the authors aim to focus on tackling the challenges of detecting multiple disfluencies present simultaneously in a speech sample.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call