Relation of structural and vibratory kinematics of the vocal folds to two acoustic measures of breathy voice based on computational modeling.
To relate vocal fold structure and kinematics to 2 acoustic measures: cepstral peak prominence (CPP) and the amplitude of the first harmonic relative to the second (H1-H2). The authors used a computational, kinematic model of the medial surfaces of the vocal folds to specify features of vocal fold structure and vibration in a manner consistent with breathy voice. Four model parameters were altered: degree of vocal fold adduction, surface bulging, vibratory nodal point, and supraglottal constriction. CPP and H1-H2 were measured from simulated glottal area, glottal flow, and acoustic waveforms and were related to the underlying vocal fold kinematics. CPP decreased with increased separation of the vocal processes, whereas the nodal point location had little effect. H1-H2 increased as a function of separation of the vocal processes in the range of 1.0 mm to 1.5 mm and decreased with separation > 1.5 mm. CPP is generally a function of vocal process separation. H1*-H2* (see paragraph 6 of article text for an explanation of the asterisks) will increase or decrease with vocal process separation on the basis of vocal fold shape, pivot point for the rotational mode, and supraglottal vocal tract shape, limiting its utility as an indicator of breathy voice. Future work will relate the perception of breathiness to vocal fold kinematics and acoustic measures.
- Research Article
68
- 10.1044/1092-4388(2012/12-0194)
- Jun 19, 2013
- Journal of Speech, Language, and Hearing Research
In this study, the authors sought to determine (a) how specific vocal fold structural and vibratory features relate to breathy voice quality and (b) the relation of perceived breathiness to 4 acoustic correlates of breathiness. A computational, kinematic model of the vocal fold medial surfaces was used to specify features of vocal fold structure and vibration in a manner consistent with breathy voice. Four model parameters were altered: vocal process separation, surface bulging, vibratory nodal point, and epilaryngeal constriction. Twelve naïve listeners rated breathiness of 364 samples relative to a reference. The degree of breathiness was then compared to (a) the underlying kinematic profile and (b) 4 acoustic measures: cepstral peak prominence (CPP), harmonics-to-noise ratio, and two measures of spectral slope. Vocal process separation alone accounted for 61.4% of the variance in perceptual rating. Adding nodal point ratio and bulging to the equation increased the explained variance to 88.7%. The acoustic measure CPP accounted for 86.7% of the variance in perceived breathiness, and explained variance increased to 92.6% with the addition of one spectral slope measure. Breathiness ratings were best explained kinematically by the degree of vocal process separation and acoustically by CPP.
- Research Article
1
- 10.1002/lary.32452
- Jul 25, 2025
- The Laryngoscope
ABSTRACTObjectivesPhonotrauma is believed to result, in part, from elevated vocal fold contact stress associated with increased vocal fold closing speed and vocal hyperfunction. This study aimed to quantify vocal fold vibratory kinematics in phonotrauma, with the hypothesis that closing phase dynamics will be increased in individuals with phonotrauma.MethodsTwenty‐six women with phonotraumatic vocal fold lesions and 29 vocally healthy control participants underwent high‐speed videoendoscopy via a transoral rigid scope while phonating on a sustained /i/. Closing quotient (CQ), speed index (SI), amplitude‐to‐length ratio (ALR), stiffness index (STI), and maximum area declination rate (MADR) were compared between groups using independent t‐tests and Cohen's d effect sizes.ResultsParticipants with phonotrauma had higher values of ALR (mean [SD], 11.5 [4.3] vs. 8.5 [2.3], p = 0.002, d = 0.86), indicating greater amplitude of vibration compared to controls. Similarly, the MADR was also higher in the phonotrauma group (mean [SD], 1.47 [1.05] vs. 0.51 [0.16] Mpx/s, p < 0.001, d = 1.3), indicating that the vocal fold closing speed was higher in the phonotrauma group. There was no difference in CQ, SI, or STI.ConclusionThese data support the link between phonotraumatic lesions and increased vocal fold contact stress. The findings highlight the maladaptive hyperfunctional cycle that accompanies phonotraumatic lesions and the disruption to vocal efficiency in the setting of phonotrauma.Level of Evidence2.
- Research Article
9
- 10.1016/j.jvoice.2020.11.004
- Dec 4, 2020
- Journal of Voice
Impact of Instructed Laryngeal Manipulation on Acoustic Measures of Voice–Preliminary Results
- Research Article
- 10.1044/2026_jslhr-25-00797
- Apr 30, 2026
- Journal of speech, language, and hearing research : JSLHR
The overall aims of this study were to (a) examine how vocal fold kinematics differ across typical, pressed, and breathy phonation in vocally healthy adults and (b) investigate the relationships between high-speed videoendoscopic-derived kinematic measures and acoustic measures of cepstral peak prominence (CPP) and the amplitude difference between the first two spectral harmonics (H1-H2) and whether the relationships vary by phonation type. Forty vocally healthy adults (32 female, eight male, with a mean age of 26 years) underwent simultaneous transoral rigid high-speed videoendoscopy (HSV; 4,000 frames per second) and acoustic recording during sustained /i:/ in three phonation types: typical, pressed, and breathy. Primary HSV parameters included closing quotient (ClQ), speed index (SI), amplitude-to-length ratio (ALR), stiffness index (STI), and normalized maximum area declination rate (MADRn). Primary acoustic measures were CPP and H1-H2. Mixed analyses of variance were conducted for phonation type differences in HSV parameters with main effects of phonation type, sex, and their interaction. Then, multiple regression models with phonation type interactions were conducted to assess the relationships between HSV and acoustic measures. Relative to typical phonation, simulated pressed phonation showed lower values of ClQ, higher MADRn, and higher STI with large effects, whereas simulated breathy phonation demonstrated higher ClQ and lower MADRn with medium effects. CPP was significantly negatively correlated with ClQ and positively correlated with MADRn, SI, and STI. H1-H2 was significantly positively correlated with ClQ and ALR and negatively correlated with MADRn, SI, and STI. There was a significant phonation type interaction with the correlations between H1-H2 and MADRn, SI, and STI; in each, breathy phonation had a strong, negative relationship and pressed phonation had a small or negligible relationship. ClQ consistently correlated with both acoustic measures across all phonation types. Vibratory patterns in pressed phonation were suggestive of increased vocal fold contact stress, as lower ClQ and higher MADRn values suggest more abrupt, faster glottal closure. CPP and H1-H2 can reflect underlying glottal physiology, but their predictive value depends on phonation type in most cases. However, findings suggest that ClQ could be a robust physiological parameter with stable acoustic correlates regardless of phonation type.
- Research Article
2
- 10.1002/lary.32390
- Jul 9, 2025
- The Laryngoscope
ABSTRACTObjectiveTo examine the relationship between lesion size, auditory‐perceptual ratings, patient‐related quality of life measure, and acoustic voice measures in children with vocal nodules.MethodsThirteen children (5–10 years) with vocal nodules were recruited in this cross‐sectional cohort study. Auditory‐perceptual ratings of overall voice severity were performed using a Visual Analog Scale. Acoustic measures of cepstral peak prominence (CPP), low/high ratio, and the Cepstral Spectral Index of Dysphonia (CSID) were computed on vowels /a:/, /i:/, and an all‐voiced sentence. The Nuss scale was used to rate lesion size from high‐speed videoendoscopy (HSV). Patient Voice‐Related Quality of Life (PVRQOL) was obtained from both the child and the parent. Correlations were computed for HSV ratings vs. auditory ratings, PVRQOL ratings, and acoustic measures as well as for interrelationships among all variables.ResultsLesion size on HSV correlated moderately with overall severity of auditory‐perceptual voice rating and with acoustic measures CPP and CSID, but not with PVRQOL. Significant, strong correlations were observed between auditory ratings and CPP, L/H ratio, and CSID in vowels and sentences. Several moderate strength correlations were observed between PVRQOL subscales (parental and child physical function; child social emotional well‐being) and acoustic measures.ConclusionsSince nodules influence both vocal fold structure and vocal function, multiple parameters (lesion size, auditory‐perceptual ratings, measures of physical function and social–emotional impact, and acoustic measurements) are needed to fully characterize the potential effect(s) on the voice. These findings could be used to improve clinical assessment and outcome measurements in children with vocal fold nodules.Level of Evidence3.
- Research Article
- 10.1121/10.0042273
- Jan 1, 2026
- The Journal of the Acoustical Society of America
Blunt force trauma to the larynx can cause significant damage, resulting in displaced laryngeal cartilage fractures. Vertical misalignment of the left or right vocal fold (VF) in the inferior-superior direction and scarring of the VF tissue are common outcomes. The influence of inferior-superior VF displacement and VF scarring on phonation was investigated using synthetic, self-oscillating VF models in a physiologically-representative facility. Acoustic, kinematic, and aerodynamic parameters were assessed as a function of inferior-superior vertical displacement and asymmetric VF stiffness. The combination of vertical misalignment and asymmetric VF tissue stiffness became most prominent when the inferior-superior misalignment of the VFs exceeded the thickness of the medial surface. Only a small degree of stiffness asymmetry was tolerated before VF kinematics and acoustics were significantly degraded. The position of the scarred VF relative to the healthy one also influenced outcomes. If the stiffer VF was positioned inferior to the normal VF, phonatory outcomes were poorer than when it was positioned superior to the normal VF. Measures of shimmer and jitter were more than twice as high, while cepstral peak prominence was 3-5 dB lower.
- Research Article
15
- 10.1016/j.jvoice.2020.01.026
- Mar 12, 2020
- Journal of Voice
Relating Cepstral Peak Prominence to Cyclical Parameters of Vocal Fold Vibration from High-Speed Videoendoscopy Using Machine Learning: A Pilot Study
- Research Article
77
- 10.1016/j.jvoice.2009.12.010
- Mar 25, 2010
- Journal of Voice
Cepstral Analysis of Voice in Unilateral Adductor Vocal Fold Palsy
- Research Article
13
- 10.1016/j.jvoice.2023.02.030
- Mar 30, 2023
- Journal of voice : official journal of the Voice Foundation
Auditory-perceptual Parameters as Predictors of Voice Acoustic Measures
- Research Article
38
- 10.1044/2020_jslhr-20-00212
- Nov 13, 2020
- Journal of Speech, Language, and Hearing Research
Objectives This study aimed to evaluate the fidelity and accuracy of a smartphone microphone and recording environment on acoustic measurements of voice. Method A prospective cohort proof-of-concept study. Two sets of prerecorded samples (a) sustained vowels (/a/) and (b) Rainbow Passage sentence were played for recording via the internal iPhone microphone and the Blue Yeti USB microphone in two recording environments: a sound-treated booth and quiet office setting. Recordings were presented using a calibrated mannequin speaker with a fixed signal intensity (69 dBA), at a fixed distance (15 in.). Each set of recordings (iPhone-audio booth, Blue Yeti-audio booth, iPhone-office, and Blue Yeti-office), was time-windowed to ensure the same signal was evaluated for each condition. Acoustic measures of voice including fundamental frequency (fo), jitter, shimmer, harmonic-to-noise ratio (HNR), and cepstral peak prominence (CPP), were generated using a widely used analysis program (Praat Version 6.0.50). The data gathered were compared using a repeated measures analysis of variance. Two separate data sets were used. The set of vowel samples included both pathologic (n = 10) and normal (n = 10), male (n = 5) and female (n = 15) speakers. The set of sentence stimuli ranged in perceived voice quality from normal to severely disordered with an equal number of male (n = 12) and female (n = 12) speakers evaluated. Results The vowel analyses indicated that the jitter, shimmer, HNR, and CPP were significantly different based on microphone choice and shimmer, HNR, and CPP were significantly different based on the recording environment. Analysis of sentences revealed a statistically significant impact of recording environment and microphone type on HNR and CPP. While statistically significant, the differences across the experimental conditions for a subset of the acoustic measures (viz., jitter and CPP) have shown differences that fell within their respective normative ranges. Conclusions Both microphone and recording setting resulted in significant differences across several acoustic measurements. However, a subset of the acoustic measures that were statistically significant across the recording conditions showed small overall differences that are unlikely to have clinical significance in interpretation. For these acoustic measures, the present data suggest that, although a sound-treated setting is ideal for voice sample collection, a smartphone microphone can capture acceptable recordings for acoustic signal analysis.
- Research Article
12
- 10.1044/2022_jslhr-22-00502
- Mar 14, 2023
- Journal of Speech, Language, and Hearing Research
Given the importance of inspiratory phonation for assessment of vocal fold structure, the aim of this investigation was to evaluate and describe the vocal fold vibratory characteristics of inspiratory phonation using high-speed videoendoscopy in healthy volunteers. The study also examined the empirical relationship between cepstral peak prominence (CPP) and glottal area waveform measurements derived from simultaneous high-speed videoendoscopy and audio recordings. Vocally healthy adults (33 women, 28 men) volunteered for this investigation and completed high-speed videoendoscopic assessment of vocal fold function for two trials of an expiratory/inspiratory phonation task at normal pitch and normal loudness. Twelve glottal area waveform measures and acoustic CPP values were extracted for analyses. Inspiratory phonation resulted in shorter closing time, longer duration of the opening phase, and faster closing phase velocity compared to expiratory phonation. Sex differences were elucidated. CPP changes for inspiratory phonation were predicted by changes in the glottal area index and waveform symmetry index, whereas changes in CPP during expiratory phonation were predicted by changes in asymmetry quotient, glottal area index, and amplitude periodicity. Vocal fold vibratory differences were identified for inspiratory phonation when compared to expiratory phonation, the latter of which has been studied more extensively. This investigation provides important basic inspiratory phonation data to better understand laryngeal physiology in vivo and provides a basic model from which to further study inspiratory phonation in a larger population representing a broader age range. https://doi.org/10.23641/asha.22223812.
- Research Article
11
- 10.1121/10.0003961
- Apr 1, 2021
- The Journal of the Acoustical Society of America
The acoustic measure, relative fundamental frequency (RFF), has been proposed as an objective metric for assessing vocal hyperfunction; however, its underlying physiological mechanisms have not yet been fully characterized. This study aimed to characterize the relationship between RFF and vocal fold kinematics. Simultaneous acoustic and high-speed videoendoscopic (HSV) recordings were collected as younger and older speakers repeated the utterances /ifi/ and /iti/. RFF values at voicing offsets and onsets surrounding the obstruents were estimated from acoustic recordings, whereas glottal angles, durations of voicing offset and onset, and a kinematic estimate of laryngeal stiffness (KS) were obtained from HSV images. No differences were found between younger and older speakers for any measure. RFF did not differ between the two obstruents at voicing offset; however, fricatives necessitated larger glottal angles and longer durations to devoice. RFF values were lower and glottal angles were greater for stops relative to fricatives at voicing onset. KS values were greater in stops relative to fricatives. The less adducted vocal folds with greater KS and lower RFF at voicing onset for stops relative to fricatives in this study were in accordance with prior speculations that decreased vocal fold contact area and increased laryngeal stiffness may decrease RFF.
- Research Article
16
- 10.1044/2023_ajslp-23-00159
- Nov 6, 2023
- American journal of speech-language pathology
The aims of this study were to determine relationships between vocal effort and (a) acoustic correlates of vocal output and (b) supraglottic compression in individuals with primary muscle tension dysphonia (pMTD) and without voice disorders (controls) in the context of a vocal load challenge. Twenty-six individuals with pMTD and 35 vocally healthy controls participated in a 30-min vocal load challenge. The pre- and postload relationships among self-ratings of vocal effort, various acoustic voice measures, and supraglottic compression (mediolateral and anteroposterior) were tested with multiple regression models and post hoc Pearson's correlations. Acoustic measures included cepstral peak prominence (CPP), low-to-high spectral ratio, difference in intensity between the first two harmonics, fundamental frequency, and sound pressure level (dB SPL). Regression models for CPP and mediolateral compression were statistically significant. Vocal effort, diagnosis of pMTD, and vocal demand were each significant variables influencing CPP measures. CPP was lower in the pMTD group across stages. There was no statistical change in CPP following the vocal load challenge within either group, but both groups had an increase in vocal effort postload. Vocal effort and diagnosis influenced the mediolateral compression model. Mediolateral compression was higher in the pMTD group across stages and had a negative relationship with vocal effort, but it did not differ after vocal loading. CPP and mediolateral supraglottic compression were influenced by vocal effort and diagnosis of pMTD. Increased vocal effort was associated with lower CPP, particularly after vocal load, and decreased mediolateral supraglottic compression in the pMTD group.
- Research Article
3
- 10.1016/j.jvoice.2022.02.015
- Mar 20, 2022
- Journal of Voice
The Relationship Between Pitch Discrimination and Acoustic Voice Measures in a Cohort of Female Speakers
- Research Article
10
- 10.1242/jeb.172247
- Jan 1, 2018
- Journal of Experimental Biology
The complex and elaborate vocalizations uttered by many of the 10,000 extant bird species are considered a major driver in their evolutionary success, warranting study of the underlying mechanisms of vocal production. Additionally, birdsong has developed into a highly productive model system for vocal imitation learning and motor control, where, in contrast to humans, we have experimental access to the entire neuromechanical control loop. In human voice production, complex laryngeal geometry, vocal fold tissue properties, airflow and laryngeal musculature all interact to ultimately control vocal fold kinematics. Quantifying vocal fold kinematics is thus critical to understanding neuromechanical control of voiced sound production, but in vivo imaging of vocal fold kinematics in birds is experimentally challenging. Here, we adapted and tested electroglottography (EGG) as a novel tool for examining vocal fold kinematics in the avian vocal organ, the syrinx. We furthermore imaged and quantified syringeal kinematics in the pigeon (Columba livia) syrinx with unprecedented detail. Our results show that EGG signals predict (1) the relative amount of contact between the avian equivalent of vocal folds and (2) essential parameters describing vibratory kinematics, such as fundamental frequency, and timing of syringeal opening and closing events. As such, EGG provides novel opportunities for measuring syringeal vibratory kinematic parameters in vivo Furthermore, the opportunity for imaging syringeal vibratory kinematics from multiple planar views (horizontal and coronal) simultaneously promotes birds as an excellent model system for studying kinematics and control of voiced sound production in general, including in humans and other mammals.