Wake-up-word speech recognition application for first responder communication enhancement

Veton Këpuska,Jason Breitfeller,Edward M Carapezza

doi:10.1117/12.666025

Abstract

Speech Recognition systems, historically, have proven to be cumbersome and insufficiently accurate for a range of applications. The ultimate goal of our proposed technology is to fundamentally change the way current Speech Recognition (SR) systems interact with humans and develop an application that is extremely hardware efficient. Accurate SR and reasonable hardware requirements will afford the average first responder officer, e.g., police officer, a true break-through technology that will change the way an officer performs his duties. The presented technology provides a cutting-edge solution for human-machine interaction through the utilization of a properly solved Wake-Up-Word (WUW) SR problem. This paradigm-shift provides the basis for development of SR systems with truly capabilities, impacting all SR based technologies and the way in which humans interact with computers. This shift is a radical departure from the current push-to-talk paradigm currently applied to all speech-to-text or speech-recognition applications. To be able to achieve this goal, a significantly more accurate pattern classification and scoring technique is required, which in turn provides SR systems enhanced performance for correct recognition (i.e., minimization of false rejection) as well as correct rejection (i.e., minimization of false acceptance). A revolutionary and innovative classification and scoring technique is used that is a significant enhancement over an earlier method presented in reference [1]. The solution in reference [1] has been demonstrated to meet the stringent requirements of the WUW-SR task. Advanced solution of [1] is a novel technique that is model and algorithm independent. Therefore, it could be used to significantly improve performance of existing recognition algorithms and systems. Reduction of error rates of over 40% are commonly observed for both false rejections and false acceptance. In this paper the architecture of the WUW-SR based system as interface to current SR applications is presented. In this system WUW-SR is used as a gateway for truly Voice Activated applications utilizing the current solution without push-to-talk paradigm. The technique has been developed with hardware optimization in mind and therefore has the ability to run as a background application on a standard Windows-based PC platform.© (2006) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

Full Text