Acoustic voice variation within and between speakers.

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Little is known about the nature or extent of everyday variability in voice quality. This paper describes a series of principal component analyses to explore within- and between-talker acoustic variation and the extent to which they conform to expectations derived from current models of voice perception. Based on studies of faces and cognitive models of speaker recognition, the authors hypothesized that a few measures would be important across speakers, but that much of within-speaker variability would be idiosyncratic. Analyses used multiple sentence productions from 50 female and 50 male speakers of English, recorded over three days. Twenty-six acoustic variables from a psychoacoustic model of voice quality were measured every 5 ms on vowels and approximants. Across speakers the balance between higher harmonic amplitudes and inharmonic energy in the voice accounted for the most variance (females = 20%, males = 22%). Formant frequencies and their variability accounted for an additional 12% of variance across speakers. Remaining variance appeared largely idiosyncratic, suggesting that the speaker-specific voice space is different for different people. Results further showed that voice spaces for individuals and for the population of talkers have very similar acoustic structures. Implications for prototype models of voice perception and recognition are discussed.

Similar Papers
  • Research Article
  • Cite Count Icon 1
  • 10.1121/1.5102021
Variation in voice quality within speakers
  • Mar 1, 2019
  • The Journal of the Acoustical Society of America
  • Yoonjeong Lee + 1 more

Little is known about the nature or extent of everyday variability in voice quality within a speaker or how this differs across speakers. Using a suite of measures that map between acoustics and perception of voice quality, this study elucidates which acoustic variables within speakers’ individual voice spaces best characterize speakers. Based on studies of faces and cognitive models of speaker recognition, we hypothesized that a few measures would be important across speakers, but that much intra-speaker variability would be idiosyncratic. By using principal component analysis, we tested this hypothesis against a set of multiple sentence productions from 100 native speakers of English (fifty females and fifty males), recorded over three days. Acoustic variables measured every 5 ms on vowels and approximants corresponded to F0, vowel quality, spectral noise, source spectral shape, and variability. Across speakers the balance between higher harmonic amplitudes and inharmonic energy in the voice accounted for the most variance (females = 20%/males = 22%). Vowel quality and its variability accounted for an additional 12%/12% of variance. Remaining variance appeared largely idiosyncratic, suggesting that the speaker-specific voice space is different for different people. Notably, F0 did not emerge from these analyses. Implications for voice recognition are discussed. [Work supported by NIH/NSF.]

  • Research Article
  • Cite Count Icon 3
  • 10.1121/1.5137431
Within- and between-speaker acoustic variability: Spontaneous versus read speech
  • Oct 1, 2019
  • The Journal of the Acoustical Society of America
  • Yoonjeong Lee + 1 more

Using principal component analysis (PCA), our previous study [JASA 145(pt. 2), 1930, (2019)] of read sentences found surprisingly similar acoustic voice spaces for groups of female and male talkers and for the individuals within groups. Formant frequencies and the balance between higher harmonic amplitudes and inharmonic energy in the voice accounted for the most acoustic variance within and across talkers, but many further details varied idiosyncratically for individual talkers. In this study, we replicated this finding using a set of recorded phone conversations from 99/100 original speakers (49 F), hypothesizing that the same measures would characterize both individual and population acoustic spaces, despite greater acoustic variability for spontaneous utterances. F0, formant frequencies, spectral noise, source spectral shape, and their variability were measured every 5 ms from vowels and approximants. Individual and group PCAs revealed that the acoustic voice spaces derived from spontaneous speech are highly similar to those spaces previously identified based on read speech. One significant difference between the two speaking styles was that unlike read speech, variability in F0 emerged as one of the variables that accounted for significant acoustic variability in spontaneous speech. Implications for voice learning, recognition, and discrimination will be discussed. [Work supported by NIH/NSF.]

  • Research Article
  • Cite Count Icon 1
  • 10.1121/1.4920824
Within- and between-talker variability in voice quality in normal speaking situations
  • Apr 1, 2015
  • The Journal of the Acoustical Society of America
  • Jody Kreiman + 4 more

Increasing evidence suggests that voices are best thought of as complex auditory patterns, and that listeners perceive and remember voices with reference to a “prototype” or “average” for that talker. Little is known about how, and how much, individual talkers vary their voice quality across situations that arise in every-day speaking, so the nature and extent of variability underlying these abstract averages, and thus the nature of the averages themselves, is unclear. The theoretical relationship between acoustic similarity and confusability in the context of a prototype model also remains unclear. In this preliminary study, 9 tokens of the vowel /a/ were recorded from 5 females on three dates. Measures of F0, spectral slope, HNR, and formant frequencies and their variability were gathered for all voice samples and acoustic distances between talkers were calculated under the assumption that all acoustic variables were equally important perceptually. Perceptual confusability was assessed in a same/different task, and predictions under the equal perceptual weight assumption were tested. Discussion will focus on how much variability is required before a voice sample no longer sounds like the originating talker, and on how the perceptual importance of each acoustical variable varies across talkers and acoustic contexts. [Work supported by NSF and NIH.]

  • Research Article
  • Cite Count Icon 2
  • 10.1121/1.5146847
Language effects on acoustic voice variation within and between talkers
  • Oct 1, 2020
  • The Journal of the Acoustical Society of America
  • Yoonjeong Lee + 1 more

Acoustic voice spaces for English speakers are characterized mainly by variability in F0, the balance between higher harmonic amplitudes and inharmonic energy, and higher formant frequencies [JASA 146(4), 3011 (2019)]. We extended this investigation to another language to test the hypothesis that a few biologically relevant measures will emerge commonly across languages, while remaining variance will depend on the structure of the language. This hypothesis was tested against sentence productions from 5 female and 5 male speakers of Seoul Korean. Like English, Korean does not have tone or phonation contrasts, but Seoul Korean exhibits specific phrase intonation patterns. PCAs were performed on scaled values of F0, formant frequencies, spectral noise, source spectral shape, and their variability, measured from vowels and approximants. Results revealed striking similarities between the acoustic voice spaces derived from Korean speakers and those for English speakers. For Korean voices, F0 and variability in lower formant frequencies (i.e., vowel quality) accounted for the most acoustic variance within and across talkers, presumably due to Seoul speakers' systematic use of these measures for phrasal/accentual information. These measures were insignificant for English voices. Our findings suggest that acoustic voice spaces are shaped by both biologically and phonologically relevant factors. [Work supported by NIH/NSF.]

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 1
  • 10.14712/24646830.2022.26
Bilingual acoustic voice variation: the case of Sorani Kurdish-Persian speakers
  • Jan 17, 2023
  • AUC PHILOLOGICA
  • Maral Asiaee + 1 more

Many individuals around the world speak two or more than two languages. This phenomenon adds a fascinating dimension of variability to speech, both in perception and production. But do bilinguals change their voice when they switch from one language to the other? It is typically assumed that while some aspects of the speech signal vary for linguistic reasons, some indexical features remain unchanged across languages. Yet little is known about the influence of language on within- and between-speaker vocal variability. The present study investigated how acoustic parameters of voice quality are structured in two languages of a bilingual speaker and to what extent such features may vary between bilingual speakers. For this purpose, speech samples of 10 simultaneous Sorani Kurdish-Persian bilingual speakers were acoustically analyzed. Following a psychoacoustic model proposed by Kreiman (2014) and using a series of principal component analyses, we found that Sorani Kurdish-Persian bilingual speakers followed a similar acoustic pattern in their two different languages, suggesting that each speaker has a unique voice but uses the same voice parameters when switching from one language to the other.

  • Research Article
  • Cite Count Icon 182
  • 10.1016/s0896-6273(00)80824-7
Are cortical models really bound by the "binding problem"?
  • Sep 1, 1999
  • Neuron
  • Maximilian Riesenhuber + 1 more

Are cortical models really bound by the "binding problem"?

  • Research Article
  • Cite Count Icon 19
  • 10.2747/0272-3638.24.8.691
Quality of Life in Saskatoon 1991 and 1996: A Geographical Perspective
  • Dec 1, 2003
  • Urban Geography
  • James E Randall + 1 more

Interest in the concept of "quality of life" (QOL) has increased exponentially in many areas of public policy. A constant theme in QOL research in the last 30 years has been a focus on the measurement and the types of indicators utilized. The objective of this paper is to identify the structure, spatial variation, and change in quality of life from 1991 to 1996 within Saskatoon, Saskatchewan by using a range of indicators relating to the social and physical environment, modified by perception. The QOL model u ed was developed by combining aspects of Cutter's (1985) geographical model of quality of life and Myers' (1987) community of quality of life model to assess QOL over time from a geographical perspective. The integration of objective, subjective and perceptual indicators, using a survey of Saskatoon residents, allowed for a broader interpretation of quality of life than is normally the case. The structure of QOL identified from the results of a series of principal component analyses consistently identified two important structures, structures of general affluence and general disadvantage. Results showed that the most disadvantaged residents in 1991 and 1996 were living to the west of the Central Business District, clustered in several neighborhoods, while residents enjoying a higher QOL tended to live in the suburbs toward the periphery of Saskatoon.

  • Research Article
  • Cite Count Icon 28
  • 10.1016/j.jvoice.2019.05.012
Acoustic Features of Transfeminine Voices and Perceptions of Voice Femininity
  • Jun 13, 2019
  • Journal of Voice
  • Kimberly L Dahl + 1 more

Acoustic Features of Transfeminine Voices and Perceptions of Voice Femininity

  • Research Article
  • 10.1121/1.5137587
The role of between- versus within-speaker acoustic variability in vocal identity perception
  • Oct 1, 2019
  • The Journal of the Acoustical Society of America
  • Jody E Kreiman + 1 more

Our recent studies [JASA 145(Pt. 2), 1930, (2019); this conference] show that acoustic spaces characterizing within- and between-speaker variability in voice quality have similar structures, with a few features (acoustic variability and formant dispersion) important for all speakers combined with idiosyncratic features characterizing individual talkers. These findings suggest that voice discrimination should be based on shared features, while “telling voices together” should depend on knowledge of each individual’s vocal idiosyncrasies. To test this hypothesis, we selected a set of voices that varied systematically in acoustic closeness to each other and to the center of the group acoustic space, based on values of shared features. Listeners were presented with multiple samples of each voice and were asked to sort the voices into piles according to perceived speaker identity. Based on prototype models of voice perception, we hypothesized that errors in telling voices apart would be strongly predictable from distances in the group acoustic space, but that errors in telling voices together would not be significantly associated with these distances. Separate analyses will attempt to shed light on the features and strategies involved in this second kind of judgment. [Work supported by NIH/NSF.]

  • Research Article
  • Cite Count Icon 82
  • 10.1121/1.402839
Formant frequency discrimination by Japanese macaques (M a c a c a f u s c a t a)
  • Jun 1, 1992
  • The Journal of the Acoustical Society of America
  • Mitchell S Sommers + 3 more

These studies investigated formant frequency discrimination by Japanese macaques (Macaca fuscata) using an AX discrimination procedure and techniques of operant conditioning. Nonhuman subjects were significantly more sensitive to increments in the center frequency of either the first (F1) or second (F2) formant of single-formant complexes than to corresponding pure-tone frequency shifts. Furthermore, difference limens (DLs) for multiformant signals were not significantly different than those for single-formant stimuli. These results suggest that Japanese monkeys process formant and pure-tone frequency increments differentially and that the same mechanisms mediate formant frequency discrimination in single-formant and vowel-like complexes. The importance of two of the cues available to mediate formant frequency discrimination, changes in the phase and the amplitude spectra of the signals, was investigated by independently manipulating these two parameters. Results of the studies indicated that phase cues were not a significant feature of formant frequency discrimination by Japanese macaques. Rather, subjects attended to relative level changes in harmonics within a narrow frequency range near F1 and F2 to detect formant frequency increments. These findings are compared to human formant discrimination data and suggest that both species rely on detecting alterations in spectral shape to discriminate formant frequency shifts. Implications of the results for animal models of speech perception are discussed.

  • Research Article
  • 10.1159/000546421
A Study of Voice Quality and Acoustic Variability in Sound Prolongation Performance in 5–12-Year-Old Children
  • Jun 9, 2025
  • Folia Phoniatrica et Logopaedica
  • Mridhula Murali + 5 more

Introduction: Voice disorders, or dysphonia, in children impact communication, social interactions, and quality of life, emphasizing the need for effective assessment tools with accurate reference norms. Acoustic measures taken during sound prolongation are widely used to evaluate voice quality, but variability in children’s performance and limited norms from children from diverse backgrounds pose challenges for clinicians. This study investigated voice quality and variability in sound prolongation tasks among 5–12-year-old school children, contributing to the development of acoustic reference data. Method: A total of 275 primary school-aged children in Scotland participated, producing sustained phonations of [a], [s], and [z] to evaluate respiratory and phonatory performance. Durations and acoustic measures, including jitter, shimmer, harmonics-to-noise ratio (HNR), cepstral peak prominence (CPP), and s/z ratio, were analyzed to capture variability in performance. Results: Analysis indicated significant age-related increases in sound prolongation durations, with older children (7–12 years) outperforming younger children (5–6 years), reflecting enhanced respiratory capacity and vocal fold control. While jitter, shimmer, and HNR did not differ significantly across age groups, CPP values were higher in older children, indicating improved vocal stability and harmonic richness. Median s/z ratios also showed significant age-related changes, highlighting developmental changes in phonatory and respiratory coordination. Notably, children exhibited longer average sound prolongation durations than previously reported norms, with considerable variability in performance. No significant sex differences were found, except for the s/z ratio, where females had higher values. Conclusion: These findings contribute and advance the growing body of reference data for assessing voice quality in children and emphasize the importance of factors such as age and sex in large, diverse samples. The study highlights the need to account for developmental variability and robust, comprehensive methodologies to contextualize voice quality issues in children.

  • Research Article
  • Cite Count Icon 12
  • 10.1017/s0025100319000094
Examining the relationship between vowel quality and voice quality
  • Jul 25, 2019
  • Journal of the International Phonetic Association
  • Christina M Esposito + 2 more

The majority of studies on phonation types have focused on low vowels due to the minimal effects of their first formant on harmonic amplitude. In studies of multiple vowel qualities, reports on the relationship between vowel and voice quality are mixed: some show similar formant frequencies across phonation types (e.g. Abramson, Nye & Luangthongkum 2007, Khan 2012), while others show different formant frequencies depending on voice quality (e.g. Ren 1992, Kuang 2011). Results differ as to whether the degree of non-modal phonation varies (Andruski & Ratliff 2000, Kuang 2011) or does not vary (Esposito 2012, Khan 2012) across different vowel qualities. The present study draws on innovations which allow for more accurate corrections for the effects of formant frequencies on spectral measures (i.e. Hanson 1995, Iseli, Shue & Alwan 2007) to examine the relationship between vowel quality and voice quality, in eight languages – !Xóõ, Burmese, Gujarati, Jalapa de Díaz Mazatec, Mon, Santa Ana del Valle Zapotec, White Hmong, and Yi. While no significant difference in the degree of non-modal phonation due to vowel quality was found, results showed a crosslinguistic pattern in the relationship between vowel quality and voice quality: vowels with higher log(F1) and log(F2) values tended to be produced with creakier phonation, while vowels with lower log(F1) and log(F2) values tended to be produced with breathier phonation, but only on the measure H1*-H2*.

  • Research Article
  • Cite Count Icon 3
  • 10.1055/s-2007-998208
Subjektive und objektive Stimmevaluation nach Kehlkopfteilresektion
  • Jul 1, 1990
  • Laryngo-Rhino-Otologie
  • M Ptok* + 1 more

There is no generally accepted, standardized approach for evaluation of voice quality and of intelligibility after partial laryngectomy. A voice evaluation which considers some aspects of voice quality is possible by assessing physical and acoustic voice parameters. But this approach does not consider how the patient subjectively assesses his postoperatively altered voice and how the patient believes he is understood by various communication partners. In this study objective and subjective variables of the voice quality of 32 patients with partial laryngectomies were measured. First, selected physical and acoustic variables of voice quality were quantified. Second, subjective criteria of voice quality and of intelligibility were assessed by a questionnaire. A significant correlation between variables of objective and subjective voice quality was found. The maximum vocal intensity, the maximum pitch, and the intensity range correlated significantly with the subjective assessment of intelligibility. No relationship was found between the acoustic variables and the subjectively perceived degree of vocal disability.

  • Research Article
  • Cite Count Icon 15
  • 10.1016/j.jvoice.2011.03.005
Septorhinoplasty With Spreader Grafts Enhances Perceived Voice Quality Without Affecting Acoustic Characteristics
  • May 8, 2011
  • Journal of Voice
  • Oner Celik + 5 more

Septorhinoplasty With Spreader Grafts Enhances Perceived Voice Quality Without Affecting Acoustic Characteristics

  • Research Article
  • Cite Count Icon 15
  • 10.1016/j.jvoice.2006.07.001
Assessment of the Formant Frequencies in Normal and Laryngectomized Individuals Using Linear Predictive Coding
  • Sep 28, 2006
  • Journal of Voice
  • Rehan A Kazi + 6 more

Assessment of the Formant Frequencies in Normal and Laryngectomized Individuals Using Linear Predictive Coding

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.

Search IconWhat is the difference between bacteria and viruses?
Open In New Tab Icon
Search IconWhat is the function of the immune system?
Open In New Tab Icon
Search IconCan diabetes be passed down from one generation to the next?
Open In New Tab Icon