Abstract

Speaker verification by machine alone may be more accurate than by human listener but it is slower and demands powerful programs and peripherals. Simple recording devices can juxtapose a claimant utterance with a stored sample to provide rapid verification by human judgement, but this raises the question of how to optimize the sample size between insufficient information and an overload of auditory memory. To identify the processes at work in such judgements, a simulation was conducted of the situation where a human operator verifies claimant speakers against stored samples of a standard utterance. Realism was incorporated by restricting signals to telephone frequency bandwidth while both control and a stringent level of difficulty were incorporated by the selection of 5 better than average imposters and five more than averagely imitable male speakers. Naive, unselected listeners participated. With a 9-syllable sentence lasting about 2 seconds, correct acceptances varied from 92% to 100% and false acceptances from 54% to 21%. Conditions in which the length of the sample was reduced in various ways gave lower performance. The major factor differentiating the performance of individual subjects was a bias factor—the degree to which “same” responses pre-dominated over “different” responses. Despite this, the different sample conditions tended to produce a fixed percentage of acceptance responses rather than a proportion varying with the available sensitivity in the fashion of an optimal decision-maker. The data justify several conclusions. (1) Listeners can integrate speaker information over periods as long as 2 seconds and probably longer. (2) Improvement in performance can result from increasing the length of either the claimant utterance or the stored sample even when the other cannot be increased. Thus it appears that listeners are extracting and storing parameters characterising the style of a speaker rather than matching a raw sound image. (3) Speaker verification by skilled listeners should be able to reach levels of sensitivity which, in combination with manipulations of the acceptance criterion, would ensure tolerably low false acceptance rates. (4) Training of the listener in speaker verification should involve training of acceptance criteria as well as perceptual discrimination training.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.