Voice Recognizer Research Articles

INTRODUCTION An Interactive Voice Response (IVR) System is a platform for man-machine interaction by the use of voice or keypad. Examples abound. Whenever one calls most large organizations, their initial encounter is with a machine that will prompt the caller for their intent. Usually, such machines will give you options to choose from (Directed Dialog), or it may ask for your input (Open Dialog). In the case of Open Dialog, there is the risk that the machine does not understand a caller input. This is an area where a lot of investigation takes place to deduce why this is the case. The technology for recognizing keyed input is not as challenging as speech technology because each key on the keypad corresponds to a specific sound frequency that cannot be confounded with another key. This technology is called Dual Tone Multi Frequency (DTMF.(i)); and it is a mature technology due to the fact that there is little or no variability in the tone emitted by a particular key. This is not the case with speech. In the case of speech technology, there are several variables that come into play. These include whether a caller barges-into a prompt, whether there is a lot of background noise that may be of similar frequency as the spoken utterance, whether the user is using a cell phone, a speaker phone, or a computer. These, and several other factors, affect the way an IVR system recognizes the caller input. This paper is an attempt to establish guidelines for determining the best settings under which an IVR system should accept a caller input using ROC analysis. REVIEW OF LITERATURE Receiver Operating Characteristics (ROC) analysis has been used in medical imaging to measure diagnostic accuracy (Metz, 2008; Pepe, 2000; Griner, Mayewski, Mushlin, & Greenland, 1981). To diagnose diseases, (McClish, 1989) used this technique to analyze the accuracy of the diagnosis. He preferred this technique because it provided the investigator with all possible combinations of sensitivity and specificity. ROC analysis has been used in the field of radiology (Metz & Obuchowski, 2003). ROC analysis was applied to biomedical informatics, (Lasko, Bhagwat, Zou, & Ohno-Machado, 2005; Brown & Davis, 2006; Hand, & Till, 2001), Signal Detection Theory (Green & Swets, 1966); it provides a precise language and graphic notation for analyzing decision-making in the presence of uncertainty. ROC curves are used extensively in epidemiology and medical research and are frequently mentioned in conjunction with evidence- based medicine (Zweig & Campbell, 1993). Bond and DePaulo (2006) used ROC analysis to study the accuracy of Deception judgments by studying over 20,000 judgments, and came to the conclusion that such analysis correlated strongly with other methods of analysis. In the field of Artificial Intelligence (Fogarty, Baker, & Hudson, 2005), ROC curves have proved useful for the evaluation of machine learning techniques (Flach, 2004; Fawcett, 2006). The approach used in this paper is to extend the use of ROC analysis to Speech Recognition. If an utterance is clearly understood (with high/medium confidence) the caller will be led further down the rest of the call flow. If, however, the IVR engine is not certain what the caller input is, it would be compelled to re-prompt the caller so as to confirm that the original intent was correctly identified. After the second attempt at recognition, for caller inputs that are still not clearly understood by the IVR engine, the caller will be transferred to a live agent. This is what the IVR engine is designed for - to minimize (and possibly eliminate) the cost of transferring to a live agent. THE ENVIRONMENT The Interactive Voice Response (IVR) environment consists of a platform for collecting and analyzing caller utterances using a voice recognizer. The quality of the categorization varies with the parameter settings of the recognizer. The two main parameters of the recognizer are: the energy floor and the confidence threshold. …

Read full abstract

Capturing accurate food intake data from participants enrolled in nutrition studies is essential for understanding relationships between diet and chronic disease (1). Numerous methods are employed to assess dietary intake such as food records, 24-hour recalls, or food frequency questionnaires. While each of these techniques is valuable, the error associated with each is unique. The food record requires a motivated participant, is tedious for some, places attention on the act of eating thus altering intake and is difficult for subjects with low literacy skills (2). Interviewing subjects about the previous day’s intake avoids the reactivity involved when recording current intake, but also requires the individual reporting intake to have good recall skills, knowledge of food names and ability to estimate amounts eaten; and requires a well-trained interviewer which makes this a costly process (2, 3). Food frequency questionnaires are limited by food lists and lack of detail regarding food preparation, and require respondents to summarize past intake over many months or the past year. Such instruments are known to contain significant measurement error (4). While all these methods provide valuable information about dietary intake, improving methodology even modestly would advance our knowledge about the influence of food intake on health. FIVR (Food Intake Visual and voice Recognizer), a subproject of the Genes, Environment, and Health Initiative from the National Institutes of Health (RFA-CA-07-032 at www.gei.nih.gov/index.asp), is designed to use new digital photographing technology to reduce measurement error associated with a food record. The intent is to create a tool that would both increase accuracy of intake records and reduce the recording burden for respondents. Using a mobile phone with a camera (Figure 1), the participant will photograph foods both before and after eating. In this way initial portion size is recorded as well as portions left uneaten. The photographs would be used to identify both the types and amounts of foods consumed. This paper briefly describes the technology and techniques involved. Figure 1 Typical Mobile Phone Interface showing (a). operator instruction screen, (b) menu of activities available and (c) camera poised to record meal. Creating sufficiently detailed images Capturing images of meals using a mobile phone presents its own unique challenges. Identifying foods from a picture requires a clear image; the automatic calculation of the amount eaten (volume) requires three or more clear images to be taken by the mobile phone user. Since a single image will not support estimation of food volume, rather 3-dimensional objects must be viewed at more than one angle (5, 6). The three images in Figure 2 are captured from 3 slightly different angles. A calibration object is also required in the images for determination of 3-dimensional size (see Figure 2). The calibration object (fiduciary marker) included in the images in Figure 2 is a card with black and white squares of known size. However, a standard credit card can be used to establish the relationship between size in image pixels and actual size of the object in milliliters. Images are also required before and after the meal is eaten to document the volume of food consumed. Figure 2 Three images captured by moving the camera using the FIVR mobile phone system. Quality of the image hinges on several factors including resolution (roughly indicated by number of pixels per image). Higher resolution (more pixels per image) creates larger files, which makes transferring images slower and more subject to failure, thus testing and refinement of the image details is integral to developing a successful system. Camera focus is critical since the best volumetric estimation is obtained when the three images are in focus and taken with the plate at the same distance from the camera. With fixed focus cameras, the images will be blurred if not taken at the right distance (which is often too great). With auto-focus cameras, the focusing is assured but the distance still must be maintained by the user. Ways to adjust the image to correct for small variations in distance are still being explored.

Read full abstract

Voice Recognizer Research Articles

Articles published on Voice Recognizer

Boosting Sensing Performance of Flexible Piezoelectric Pressure Sensors by Sb Nanosheets and BaTiO3 Nanoparticles Co‐Doping in P(VDF‐TrFE) Nanofibers Mat

Addressing the selection bias in voice assistance: training voice assistance model in python with equal data selection

Vocal Airmail Service for Visually Impaired through Multimedia Approach

『길 위 1번지』, AI 제임스의 소설:「소설의 기술」과 인공신경망 알고리즘의 글쓰기

Virtual safety device for women security

Virtual Friendly Device for Women Security

임베디드 시스템에서 사용 가능한 적응형 MFCC와 Deep Learning 기반의 음성인식

Establishment of Confidence Thresholds for Interactive Voice Response Systems Using ROC Analysis

Human Tracking Methods Comparison for Smart House Method Comparison in Human Tracking for Smart House Purpose

Automatic Food Documentation and Volume Computation Using Digital Imaging and Electronic Transmission

1A2-M11 パラボラ集音器を用いた音声認識装置による電動車いす制御

Using Language Technology to Increase Efficiency and Safety in ATC Communication

User interface for telematics systems

Load-adjusted speech recogintion

Voice announcement management system

Speech recognition method and system using compressed speech data

Keep talking—Performance effectiveness with continuous voice recognition for spreadsheet users

Dysarthric speakers' intelligibility and speech characteristics in relation to computer speech recognition

Airborne Message Entry by Voice Recognition

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Voice Recognizer Research Articles

Articles published on Voice Recognizer

Boosting Sensing Performance of Flexible Piezoelectric Pressure Sensors by Sb Nanosheets and BaTiO3 Nanoparticles Co‐Doping in P(VDF‐TrFE) Nanofibers Mat

Addressing the selection bias in voice assistance: training voice assistance model in python with equal data selection

Vocal Airmail Service for Visually Impaired through Multimedia Approach

『길 위 1번지』, AI 제임스의 소설:「소설의 기술」과 인공신경망 알고리즘의 글쓰기

Virtual safety device for women security

Virtual Friendly Device for Women Security

임베디드 시스템에서 사용 가능한 적응형 MFCC와 Deep Learning 기반의 음성인식

Establishment of Confidence Thresholds for Interactive Voice Response Systems Using ROC Analysis

Human Tracking Methods Comparison for Smart House Method Comparison in Human Tracking for Smart House Purpose

Automatic Food Documentation and Volume Computation Using Digital Imaging and Electronic Transmission

1A2-M11 パラボラ集音器を用いた音声認識装置による電動車いす制御

Using Language Technology to Increase Efficiency and Safety in ATC Communication

User interface for telematics systems

Load-adjusted speech recogintion

Voice announcement management system

Speech recognition method and system using compressed speech data

Keep talking—Performance effectiveness with continuous voice recognition for spreadsheet users

Dysarthric speakers' intelligibility and speech characteristics in relation to computer speech recognition

Airborne Message Entry by Voice Recognition