Abstract
The development and popularity of voice-user interfaces made spontaneous speech processing an important research field. One of the main focus areas in this field is automatic speech recognition (ASR) that enables the recognition and translation of spoken language into text by computers. However, ASR systems often work less efficiently for spontaneous than for read speech, since the former differs from any other type of speech in many ways. And the presence of speech disfluencies is its prominent characteristic. These phenomena are an important feature in human-human communication and at the same time they are a challenging obstacle for the speech processing tasks. In this paper we address an issue of voiced hesitations (filled pauses and sound lengthenings) detection in Russian spontaneous speech by utilizing different machine learning techniques, from grid search and gradient descent in rule-based approaches to such data-driven ones as ELM and SVM based on the automatically extracted acoustic features. Experimental results on the mixed and quality diverse corpus of spontaneous Russian speech indicate the efficiency of the techniques for the task in question, with SVM outperforming other methods.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have