Cooperative Speaker Research Articles

Speaker verification involves the determination of whether or not a test utterance belongs to a specific reference speaker. The utterance is either accepted as belonging to the reference speaker or rejected as belonging to an imposter. Speaker verification has great potential for security applications, such as physical access control, computer data access control, and automatic telephone transaction control. The main components of a general speaker verification system are shown in Fig. 1. A speaker verification task consists of two phases. In the training phase, reference templates are created for particular speakers using the signal processor shown in Fig. la. During the uerification phase (Fig. lb), the identity claimed by a speaker is verified using a test utterance from that speaker. The inputs to the system consist of a test utterance (sampled at 10 kHz in our experiments) and the claimed identity of the reference speaker. The signal processor can be further subdivided into three steps: normalization, parameterization, and feature extraction. These steps involve preprocessing and information reduction (or elimination of redundancies) in the input data sequence to obtain speaker templates during the training and verification phases. The normalization step consists of noise reduction, signal amplitude level control, and time warping to reduce the effect of different speaking rates. This is followed by the parameterization step to reduce the amount of data with minimal information loss about the speaker characteristics. An optional feature extraction step can be used to further reduce the data. Next the test template is compared to the reference template. The accept/reject decision is usually based on the computation of a distance function which quantifies the degree of dissimilarity between the test template and the reference template [l] . If the distance exceeds a threshold, the system rejects the match. Comparing the test and the training templates in the verification phase is much simpler if the underlying texts of the utterances are the same. Normally, this text-dependent mode is possible only for cooperative speakers. In forensic work, for example, speakers are often uncooperative, and the test and the training texts are often not the same. This mode is called textindependent speaker verification and the required information stored in the templates is different in this case. In general, the templates contain long-term statistical data. Error rates for text-independent recognition are considerably higher than the rates for a comparable text-dependent case. In this paper, we investigate text-independent speaker verification. A speaker verification system produces two types of errors. Type I error is caused when a true speaker is rejected as being an imposter. Type II error results when an imposter is accepted by the system as the correct speaker. Naturally, the objective in a verification task is to minimize both errors. Previous work [l, 21 on automatic text-independent speaker verification suggests that the important features for speaker discrimination are the spectral envelope parameters. Generally, in speaker verification systems, the reference speaker templates are obtained by averaging short-time spectral parameters over the complete speech utterance. In other words, an average vocal-tract shape is assumed for the duration of the utterance. However, this does not hold in practice since it is well known that different sounds are produced by vocal-tract shapes that vary widely

The present paper reports on an experiment which was set up to examine whether we can make a speaker either accent or de-accent particular words by systematically varying the objective probability that a particular referent will be mentioned (and therewith the referent's predictability for speaker and listener). In the experiment each of 24 speakers was asked to watch a visual display, showing a very simple configuration of letter symbols, and to describe orally each change in the current configuration to a listener. By manipulating the letter configurations shown on the display, the objective probability that the speaker would mention a particular letter could be controlled. Letters could either move around on the screen (moving letters) or remain fixed and serve as spatial reference points (fixed letters). Objective probabilities were 0.5 and 1 for both moving letters and fixed letters. The main findings were the following: (1) When a referent is fully predictable to speaker and listener there is a high proportion of ellipsis, particularly for the moving letter, which was always referred to from subject position. (2) The probability that a word referring to a letter will be accented appears not to be immediately controlled by the predictability of the referent. The controlling factor is rather the preceding linguistic context. More specifically, the probability of accenting, being close to 1 the first time a specific referent is mentioned, sharply decreases when the same referent is mentioned for the second time in a row, and decreases again when this same referent is mentioned three or more times in a row. However, as soon as the competing referent is mentioned once, in the same role (moving or fixed letter), the probability of accenting jumps up again. (3) The probability of accenting is systematically lower for the moving letters in subject position (average 0.32) than for the fixed letters in predicate position (average 0.52). In view of these findings, de-accenting, defined as conspicuously omitting an accent on a word that, for grammatical reasons, otherwise would have been accented, is interpreted as a device which can be used by a cooperative speaker for helping the listener to find the intended referent as easily and quickly as possible. It is supposed that speakers not using this device systematically give their listeners a harder time.

Cooperative Speaker Research Articles

Related Topics

Articles published on Cooperative Speaker

Continuous Estimation of Emotions in Speech by Dynamic Cooperative Speaker Models

Are speakers and listeners ‘only moderately Gricean’? An empirical response to Engelhardt et al. (2006)

Evaluation of the speech behaviour of reference speakers

Collaborative interaction in turn-taking: a comparative study of European bilingual (CLIL) and mainstream (MS) foreign language learners in early secondary education

Outline of the foundations for a theory of implicatures

On local time–frequency features of speech and their employment in speaker verification

Using Grice's maxim of Quantity to select the content of plan descriptions

Conversational maxims and scaffolded learning in children with learning disabilities: is the flying buttress a better metaphor?

Modal expressions as facework in refusals to comply with requests: I think I should say ‘no’ right now

Text-independent speaker verification based on broad phonetic segmentation of speech

Two Practical Strategies Young Children Use to Interpret Vague Instructions

Learning disabled children's conversational competence: An attempt to activate the inactive listener

What makes speakers omit pitch accents? An experiment.

Automatic speaker recognition using time alignment of spectrograms

Automatic Speaker Verification Using Phoneme Spectra

Approach to Computer Speech Recognition by Direct Analysis of the Speech Wave

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Cooperative Speaker Research Articles

Related Topics

Articles published on Cooperative Speaker

Continuous Estimation of Emotions in Speech by Dynamic Cooperative Speaker Models

Are speakers and listeners ‘only moderately Gricean’? An empirical response to Engelhardt et al. (2006)

Evaluation of the speech behaviour of reference speakers

Collaborative interaction in turn-taking: a comparative study of European bilingual (CLIL) and mainstream (MS) foreign language learners in early secondary education

Outline of the foundations for a theory of implicatures

On local time–frequency features of speech and their employment in speaker verification

Using Grice's maxim of Quantity to select the content of plan descriptions

Conversational maxims and scaffolded learning in children with learning disabilities: is the flying buttress a better metaphor?

Modal expressions as facework in refusals to comply with requests: I think I should say ‘no’ right now

Text-independent speaker verification based on broad phonetic segmentation of speech

Two Practical Strategies Young Children Use to Interpret Vague Instructions

Learning disabled children's conversational competence: An attempt to activate the inactive listener

What makes speakers omit pitch accents? An experiment.

Automatic speaker recognition using time alignment of spectrograms

Automatic Speaker Verification Using Phoneme Spectra

Approach to Computer Speech Recognition by Direct Analysis of the Speech Wave