INTRODUCTION An Interactive Voice Response (IVR) System is a platform for man-machine interaction by the use of voice or keypad. Examples abound. Whenever one calls most large organizations, their initial encounter is with a machine that will prompt the caller for their intent. Usually, such machines will give you options to choose from (Directed Dialog), or it may ask for your input (Open Dialog). In the case of Open Dialog, there is the risk that the machine does not understand a caller input. This is an area where a lot of investigation takes place to deduce why this is the case. The technology for recognizing keyed input is not as challenging as speech technology because each key on the keypad corresponds to a specific sound frequency that cannot be confounded with another key. This technology is called Dual Tone Multi Frequency (DTMF.(i)); and it is a mature technology due to the fact that there is little or no variability in the tone emitted by a particular key. This is not the case with speech. In the case of speech technology, there are several variables that come into play. These include whether a caller barges-into a prompt, whether there is a lot of background noise that may be of similar frequency as the spoken utterance, whether the user is using a cell phone, a speaker phone, or a computer. These, and several other factors, affect the way an IVR system recognizes the caller input. This paper is an attempt to establish guidelines for determining the best settings under which an IVR system should accept a caller input using ROC analysis. REVIEW OF LITERATURE Receiver Operating Characteristics (ROC) analysis has been used in medical imaging to measure diagnostic accuracy (Metz, 2008; Pepe, 2000; Griner, Mayewski, Mushlin, & Greenland, 1981). To diagnose diseases, (McClish, 1989) used this technique to analyze the accuracy of the diagnosis. He preferred this technique because it provided the investigator with all possible combinations of sensitivity and specificity. ROC analysis has been used in the field of radiology (Metz & Obuchowski, 2003). ROC analysis was applied to biomedical informatics, (Lasko, Bhagwat, Zou, & Ohno-Machado, 2005; Brown & Davis, 2006; Hand, & Till, 2001), Signal Detection Theory (Green & Swets, 1966); it provides a precise language and graphic notation for analyzing decision-making in the presence of uncertainty. ROC curves are used extensively in epidemiology and medical research and are frequently mentioned in conjunction with evidence- based medicine (Zweig & Campbell, 1993). Bond and DePaulo (2006) used ROC analysis to study the accuracy of Deception judgments by studying over 20,000 judgments, and came to the conclusion that such analysis correlated strongly with other methods of analysis. In the field of Artificial Intelligence (Fogarty, Baker, & Hudson, 2005), ROC curves have proved useful for the evaluation of machine learning techniques (Flach, 2004; Fawcett, 2006). The approach used in this paper is to extend the use of ROC analysis to Speech Recognition. If an utterance is clearly understood (with high/medium confidence) the caller will be led further down the rest of the call flow. If, however, the IVR engine is not certain what the caller input is, it would be compelled to re-prompt the caller so as to confirm that the original intent was correctly identified. After the second attempt at recognition, for caller inputs that are still not clearly understood by the IVR engine, the caller will be transferred to a live agent. This is what the IVR engine is designed for - to minimize (and possibly eliminate) the cost of transferring to a live agent. THE ENVIRONMENT The Interactive Voice Response (IVR) environment consists of a platform for collecting and analyzing caller utterances using a voice recognizer. The quality of the categorization varies with the parameter settings of the recognizer. The two main parameters of the recognizer are: the energy floor and the confidence threshold. …
Read full abstract