Abstract

The paper describes the key concepts of a word spotting system for Russian based on large vocabulary continuous speech recognition. Key algorithms and system settings are described, including the pronunciation variation algorithm, and the experimental results on the real-life telecom data are provided. The description of system architecture and the user interface is provided. The system is based on CMU Sphinx open-source speech recognition platform and on the linguistic models and algorithms developed by Speech Drive LLC. The effective combination of baseline statistic methods, real-world training data, and the intensive use of linguistic knowledge led to a quality result applicable to industrial use.

Highlights

  • The need to understand business trends, ensure public security, and improve the quality of customer service has caused a sustainable development of speech analytics systems which transform speech data into a measurable and searchable index of words, phrases, and paralinguistic markers

  • Most recently a number of innovative approaches to spoken term detection were offered such as various recognition system combination and score normalization, reporting 20% increase in spoken term detection quality [7, 8]

  • The most wellknown systems include Yandex SpeechKit [19] used to recognize spoken search queries via web and mobile applications, real-time speech recognition system by Speech Technology Center [20] used for transcribing speech in the broadcasting news, LVCSR system developed by SPIIRAS [21, 22] used for recognizing speech in multimodal environments, and speech recognition system by scientific institute Specvuzavtomatika [23] based on deep neural networks

Read more

Summary

Introduction

The need to understand business trends, ensure public security, and improve the quality of customer service has caused a sustainable development of speech analytics systems which transform speech data into a measurable and searchable index of words, phrases, and paralinguistic markers. Compared to more widespread textbased systems, this approach makes use of spoken examples of a keyword to build up a word-based model and do the search within speech data. The advent of the Internet has provided rich amount of data to be available for speech recognition community [16] This is of particular interest for low-resource languages and among most recent improvements [17] suggests an approach to effectively deal with the challenge of normalizing and filtering the web data for keyword spotting. The most wellknown systems include Yandex SpeechKit [19] used to recognize spoken search queries via web and mobile applications, real-time speech recognition system by Speech Technology Center [20] used for transcribing speech in the broadcasting news, LVCSR system developed by SPIIRAS [21, 22] used for recognizing speech in multimodal environments, and speech recognition system by scientific institute Specvuzavtomatika [23] based on deep neural networks.

Key System Parameters
Experimental Results
System Architecture and User Interface
Conclusion and Further Plans
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call