Towards understanding speaker discrimination abilities in humans and machines for text-independent short utterances of different speech styles.

Soo Jin Park,Patricia A Keating,Abeer Alwan,Jody Kreiman,Neda Vesselinova,Gary Yeung

doi:10.1121/1.5045323

Soo Jin Park, Patricia A Keating + Show 4 more

Open Access

https://doi.org/10.1121/1.5045323

Copy DOI

Abstract

Little is known about human and machine speaker discrimination ability when utterances are very short and the speaking style is variable. This study compares text-independent speaker discrimination ability of humans and machines based on utterances shorter than 2 s in two different speaking styles (read sentences and speech directed towards pets, characterized by exaggerated prosody). Recordings of 50 female speakers drawn from the UCLA Speaker Variability Database were used as stimuli. Performance of 65 human listeners was compared to i-vector-based automatic speaker verification systems using mel-frequency cepstral coefficients, voice quality features, which were inspired by a psychoacoustic model of voice perception, or their combination by score-level fusion. Humans always outperformed machines, except in the case of style-mismatched pairs from perceptually-marked speakers. Speaker representations by humans and machines were compared using multi-dimensional scaling (MDS). Canonical correlation analysis showed a weak correlation between machine and human MDS spaces. Multiple regression showed that means of voice quality features could represent the most important human MDS dimension well, but not the dimensions from machines. These results suggest that speaker representations by humans and machines are different, and machine performance might be improved by better understanding how different acoustic features relate to perceived speaker identity.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: The Journal of the Acoustical Society of America	Publication Date: Jul 1, 2018
Citations: 11	License type: publisher-specific, author manuscript

R Discovery Prime

R Discovery Prime

Towards understanding speaker discrimination abilities in humans and machines for text-independent short utterances of different speech styles.

Abstract

Talk to us

Similar Papers

More From: The Journal of the Acoustical Society of America

Lead the way for us

Similar Papers

Target and Non-target Speaker Discrimination by Humans and Machines
Amber Afshan ... Gary Yeung
-
Amber Afshan, et. al.Amber Afshan ... Gary Yeung
01 May 2019
01 May 2019

Vulnerability issues in Automatic Speaker Verification (ASV) systems
Priyanka Gupta ... Rodrigo Capobianco Guido
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2024
Priyanka Gupta, et. al.Priyanka Gupta ... Rodrigo Capobianco Guido
10 Feb 2024
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2024

Automatic speaker verification system for dysarthric speakers using prosodic features and out-of-domain data augmentation
Shinimol Salim ... Waquar Ahmad
Applied Acoustics | VOL. 210
Shinimol Salim, et. al.Shinimol Salim ... Waquar Ahmad
29 May 2023
Applied Acoustics | VOL. 210

Effective Combination of Multiple Evidences for I-vector Based Limited Data Speaker Verification
Krishna Dutta ... Debadatta Pati
-
Krishna Dutta, et. al.Krishna Dutta ... Debadatta Pati
01 Feb 2020
01 Feb 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Towards understanding speaker discrimination abilities in humans and machines for text-independent short utterances of different speech styles.

Abstract

Talk to us

Similar Papers

More From: The Journal of the Acoustical Society of America