A cross-language acoustic space for vocalic phonation distinctions

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Many languages use phonation types for phonemic or allophonic distinctions. This study examines the acoustic structure of the phonetic space for vowel phonations across languages. Our sample of eleven languages includes languages with contrastive modal, breathy, creaky, lax, tense, harsh, and/or pharyngealized phonations, and languages with allophonic nonmodal phonation on particular tones. In compiling and analyzing this sample we address related issues such as contrast vs. allophony, phonetic similarity across languages, and understanding complex contrasts of several multidimensional phonetic categories via data reduction. Based on extensive acoustic analysis, all of the languages' phonations were mapped into a single phonetic space, which exhibits dispersion (languages with more categories use more of the space). The space is largely two-dimensional, with dimensions that can be interpreted phonetically (e.g. dimension 2 is like a traditional breathy-to-creaky continuum) and also can be related back to the acoustic measures that structure them, thus indicating which acoustic measures are most important across languages.*

Similar Papers
  • PDF Download Icon
  • Research Article
  • Cite Count Icon 13
  • 10.1111/2041-210x.13924
A framework for quantifying soundscape diversity using Hill numbers
  • Jul 30, 2022
  • Methods in Ecology and Evolution
  • Thomas Luypaert + 6 more

Soundscape studies are increasingly used to capture landscape‐scale ecological patterns. Yet, several aspects of soundscape diversity remain unexplored. Although some processes influencing acoustic niche usage may operate in the 24‐hr temporal domain, most acoustic indices only capture the diversity of sounds co‐occurring in sound files at a specific time of day. Moreover, many indices do not consider the relationship between the spectral and temporal traits of sounds simultaneously. To provide novel insights into landscape‐scale patterns of acoustic niche usage at broader temporal scales, we present a workflow to quantify soundscape diversity through the lens of trait‐based ecology. Our workflow quantifies the diversity of sound in the 24‐hr acoustic trait space. We introduce the Operational Sound Unit (OSU), a unit of diversity measurement that groups sounds by their shared acoustic properties. Using OSUs and building on the framework of Hill numbers, we propose three metrics that capture different aspects of acoustic trait space usage: (i) soundscape richness, (ii) soundscape diversity and (iii) soundscape evenness. We demonstrate the use of these metrics by (a) simulating soundscapes to assess whether the indices possess a set of desirable behaviours and (b) quantifying soundscape richness and evenness along a gradient in species richness. We demonstrate that (a) the indices outlined herein have desirable behaviours and (b) the soundscape richness and evenness are positively correlated with the richness of sound‐producing species. This suggests that more acoustic niche space is occupied when the species richness is higher. Additionally, species‐poor acoustic communities have a higher proportion of rare sounds and use the acoustic space less evenly. Our workflow generates novel insights into acoustic niche usage at a landscape scale and provides a useful tool for biodiversity monitoring. Moreover, Hill numbers can also be used to measure the taxonomic, functional and phylogenetic diversity. Using a common framework for diversity measurement gives metrics a common behaviour, interpretation and standardised unit, thus ensuring comparisons between soundscape diversity and other metrics represent real‐world ecological patterns rather than mathematical artefacts stemming from different formulae.

  • Research Article
  • Cite Count Icon 12
  • 10.1109/tbme.2006.883800
Estimation of Vowel Recognition With Cochlear Implant Simulations
  • Jan 1, 2007
  • IEEE Transactions on Biomedical Engineering
  • Chuping Liu + 1 more

Because there are many parameters in the cochlear implant (CI) device that can be optimized for individual patients, it is important to estimate a parameter's effect before patient evaluation. In this paper, Mel-frequency cepstrum coefficients (MFCCs) were used to estimate the acoustic vowel space for vowel stimuli processed by the CI simulations. The acoustic space was then compared to vowel recognition performance by normal-hearing subjects listening to the same processed speech. Five CI speech processor parameters were simulated to produce different degree of spectral resolution, spectral smearing, spectral warping, spectral shifting, and amplitude distortion. The acoustic vowel space was highly correlated with normal hearing subjects' vowel recognition performance for parameters that affected the spectral channels and spectral smearing. However, the acoustic vowel space was not significantly correlated with perceptual performance for parameters that affected the degree of spectral warping, spectral shifting, and amplitude distortion. In particular, while spectral warping and shifting did not significantly reshape the acoustic space, vowel recognition performance was significantly affected by these parameters. The results from the acoustic analysis suggest that the CI device can preserve phonetic distinctions under conditions of spectral warping and shifting. Auditory training may help CI patients better perceive these speech cues transmitted by their speech processors.

  • Book Chapter
  • 10.3233/faia230540
SCRAPS: Speech Contrastive Representations of Acoustic and Phonetic Spaces
  • Sep 28, 2023
  • Ivan Valles-Perez + 4 more

Numerous examples in the literature proved that deep learning models have the ability to work well with multimodal data. Recently, CLIP has enabled deep learning systems to learn shared latent spaces between images and text descriptions, with outstanding zero- or few-shot results in downstream tasks. In this paper we explore the same idea proposed by CLIP but applied to the speech domain, where the phonetic and acoustic spaces usually coexist. We train a CLIP-based model with the aim to learn shared representations of phonetic and acoustic spaces. The results show that the proposed model is sensible to phonetic changes, with a 91% of score drops when replacing 20% of the phonemes at random, while providing substantial robustness against different kinds of noise, with a 10% performance drop when mixing the audio with 75% of Gaussian noise. We also provide empirical evidence showing that the resulting embeddings are useful for a variety of downstream applications, such as intelligibility evaluation and the ability to leverage rich pre-trained phonetic embeddings in speech generation task. Finally, we discuss potential applications with interesting implications for the speech generation and recognition fields.

  • Conference Article
  • Cite Count Icon 21
  • 10.1109/icassp.2017.7953149
Joint modeling of articulatory and acoustic spaces for continuous speech recognition tasks
  • Mar 1, 2017
  • Vikramjit Mitra + 7 more

Articulatory information can effectively model variability in speech and can improve speech recognition performance under varying acoustic conditions. Learning speaker-independent articulatory models has always been challenging, as speaker-specific information in the articulatory and acoustic spaces increases the complexity of the speech-to-articulatory space inverse modeling, which is already an ill-posed problem due to its inherent nonlinearity and non-uniqueness. This paper investigates using deep neural networks (DNN) and convolutional neural networks (CNNs) for mapping speech data into its corresponding articulatory space. Our results indicate that the CNN models perform better than their DNN counterparts for speech inversion. In addition, we used the inverse models to generate articulatory trajectories from speech for three different standard speech recognition tasks. To effectively model the articulatory features' temporal modulations while retaining the acoustic features' spatiotemporal signatures, we explored a joint modeling strategy to simultaneously learn both the acoustic and articulatory spaces. The results from multiple speech recognition tasks indicate that articulatory features can improve recognition performance when the acoustic and articulatory spaces are jointly learned with one common objective function.

  • Research Article
  • Cite Count Icon 33
  • 10.1111/2041-210x.13599
A machine learning approach for classifying and quantifying acoustic diversity.
  • Apr 27, 2021
  • Methods in Ecology and Evolution
  • Sara C Keen + 5 more

1. Assessing diversity of discretely varying behavior is a classical ethological problem. In particular, the challenge of calculating an individuals' or species' vocal repertoire size is often an important step in ecological and behavioral studies, but a reproducible and broadly applicable method for accomplishing this task is not currently available. 2. We offer a generalizable method to automate the calculation and quantification of acoustic diversity using an unsupervised random forest framework. We tested our method using natural and synthetic datasets of known repertoire sizes that exhibit standardized variation in common acoustic features as well as in recording quality. We tested two approaches to estimate acoustic diversity using the output from unsupervised random forest analyses: (i) cluster analysis to estimate the number of discrete acoustic signals (e.g., repertoire size) and (ii) an estimation of acoustic area in acoustic feature space, as a proxy for repertoire size. 3. We find that our unsupervised analyses classify acoustic structure with high accuracy. Specifically, both approaches accurately estimate element diversity when repertoire size is small to intermediate (5-20 unique elements). However, for larger datasets (20-100 unique elements), we find that calculating the size of the area occupied in acoustic space is a more reliable proxy for estimating repertoire size. 4. We conclude that our implementation of unsupervised random forest analysis offers a generalizable tool that researchers can apply to classify acoustic structure of diverse datasets. Additionally, output from these analyses can be used to compare the distribution and diversity of signals in acoustic space, creating opportunities to quantify and compare the amount of acoustic variation among individuals, populations, or species in a standardized way. We provide R code and examples to aid researchers interested in using these techniques.

  • Research Article
  • Cite Count Icon 5
  • 10.1080/09524622.2021.1925589
Sympatric bush cricket species co-exist across a complex landscape by optimising both acoustic and ecological space
  • May 21, 2021
  • Bioacoustics
  • Aileen Van Der Mescht + 3 more

Soundscape comprises of a mix of species-specific calls, where individuals compete for acoustic space, yet a different vegetation structure allows for differential call filtration. We focus on an assemblage of bush cricket species in a human-transformed landscape, with a special focus on the seemingly endangered Thoracistus thyraeus. Landscape transformation produces both novel ecological and acoustic spaces in which species must maintain effective communication. Using acoustic activity and species’ total call times to characterise their response to the different biotopes in the landscape, we determine how species are distributed across the landscape to optimise ecological and acoustic space. We further investigate the distribution of occupied frequency bands to determine whether species are exposed to potential acoustic interference from other sympatric species. We identified 11 bush cricket species and hypothesised that where acoustic interference between species is likely; the different species will be found in different biotopes. We found that acoustic interference between species is low as species co-exist by having distinct ecological resource requirements and inhabit different biotopes, thus preventing acoustic interference from other species. Acoustic and environmental factors play interactive roles in enabling sympatric species to co-exist across complex landscapes, illustrating that these insects can co-exist without acoustic interference.

  • Research Article
  • 10.15688/jvolsu2.2022.5.3
Вариативность гласных и ее зависимость от типа ударения и фразовой позиции
  • Oct 1, 2022
  • Vestnik Volgogradskogo gosudarstvennogo universiteta. Serija 2. Jazykoznanije
  • Sergey Batalin

The position of vowels in acoustic space is described using the values of the F1 and F2 formants. The approach is determined by the need to perceptually distinguish neighboring vowels. The area occupied by a specific vowel is described as a combination of microfields with each microfield formed by a set of allophone positions of the vowel in question. The results obtained demonstrate that the variability of the allophone position in the acoustic field can be determined by a number of factors, such as the degree of prominence and vowel position in the phrase. With this aim vowel positioning in the acoustic space in words with neutral and emphatic stress was studied. The speech material for analysis comprised the word 'Stas' embedded in the carrier phrase 'Stas ne byl tihoney' ('Stas was not quiet') with the target word occupying initial, medial and final positions in the phrase; in each position the word was pronounced with neutral and emphatic stress. F1 and F2 values of the sound [a] in the word 'Stas' were extracted with the FFT method using the Praat software. The Student paired t-test was employed to note the significance of difference between the first and second formant frequencies of neutrally and emphatically stressed vowels. The analysis revealed that the vowels uttered with emphatic stress are characterized by an expansion of their acoustic vowel space by moving off the vowel space center. The displacement occurs either through an increase of F1, a decrease of F2, or both. A general trend was observed in the impact of phrasal position on vowel formant frequencies. Though the vowels under neutral stress seemed to display greater response to the phrasal position factor compared to the emphatically stressed ones, noticeable regularities could not be established due to a high intradialectal variability of speakers.

  • Research Article
  • Cite Count Icon 18
  • 10.1002/ecy.3380
Revisiting the drivers of acoustic similarities in tropical anuran assemblages.
  • Jun 15, 2021
  • Ecology
  • Larissa Sayuri Moreira Sugai + 3 more

Acoustic signaling is key in mediating mate choice, which directly impacts individual fitness. Because background noise and habitat structure can impair signal transmission, the acoustic space of mixed-species assemblages has long been hypothesized to reflect selective pressures against signal interference and degradation. However, other potential drivers that received far less attention can drive similar outputs on the acoustic space. Phylogenetic niche conservatism and allometric constraints may also modulate species acoustic features, and the acoustic space of communities could be a side-effect of ecological assembly processes involving other traits (e.g., environmental filtering). Additionally, the acoustic space can also reflect the sorting of species relying on public information through extended communication networks. Using an integrative approach, we revisit the potential drivers of the acoustic space by addressing the distribution of acoustic traits, body size, and phylogenetic relatedness in tropical anuran assemblages across gradients of environmental heterogeneity in the Pantanal wetlands. We found the overall acoustic space to be aggregated compared with null expectations, even when accounting for confounding effects of body size. Across assemblages, acoustic and phylogenetic differences were positively related, while acoustic and body size similarities were negatively related, although to a minor extent. We suggest that acoustic partitioning, acoustic adaptation, and allometric constraints play a minor role in shaping the acoustic output of tropical anuran assemblages and that phylogenetic niche conservatism and public information use would influence between-assemblage variation. Our findings highlight an overlooked multivariate nature of the acoustic dimension and underscore the importance of including the ecological context of communities to understand drivers of the acoustic space.

  • Conference Article
  • Cite Count Icon 3
  • 10.21437/interspeech.2005-471
Cross-linguistic comparison of two-year-old children's acoustic vowel spaces: contrasting Hungarian with dutch
  • Sep 4, 2005
  • Krisztina Zajdó + 3 more

Traditional hand-edited formant measurements may result in biased assessment of vowel formants in children’s speech. Therefore, vowel spaces that are constructed by hand-edited formant measures may be unreliable. The recent development of an automated frequency domain analysis method allows for more reliable measurements. Thus, a valid comparison of the size and positioning of young children’s vowel spaces across languages can be achieved. Contrasting the extension of the vowel space utilized by young children acquiring Hungarian and Dutch can provide information pertaining to a) children’s abilities to explore the vowel space and b) potential crosslinguistic differences in exploiting the potentially available acoustic vowel space. Since a unified theory of vowel acquisition has never been developed, it is hoped that the new method will contribute to the creation of such a theory by comparing and contrasting results from diverse languages. Results suggest that two-year-old Hungarian- and Dutchspeaking children utilize the vowel space languagespecifically, by exploiting different regions within the potentially available acoustic space.

  • Conference Article
  • Cite Count Icon 18
  • 10.21437/interspeech.2008-427
The acoustic to articulation mapping: non-linear or non-unique?
  • Sep 22, 2008
  • Daniel Neiberg + 2 more

This paper studies the hypothesis that the acoustic-to-articulatory mapping is non-unique, statistically. The distributions of the acoustic and articulatory spaces are obtained by fitting the data into a Gaussian Mixture Model. The kurtosis is used to measure the non-Gaussianity of the distributions and the Bhattacharya distance is used to find the difference between distributions of the acoustic vectors producing non-unique articulator configurations. It is found that stop consonants and alveolar fricatives arc generally not only non-linear but also non-unique, while dental fricatives arc found to be highly non-linear but fairly unique. Two more investigations are also discussed: the first is on how well the best possible piecewise linear regression is likely to perform, the second is on whether the dynamic constraints improve the ability to predict different articulatory regions corresponding to the same region in the acoustic space.

  • Research Article
  • Cite Count Icon 42
  • 10.1016/j.anbehav.2010.08.021
Acoustic niche partitioning in two cryptic sibling species of Chrysoperla green lacewings that must duet before mating
  • Sep 24, 2010
  • Animal Behaviour
  • Charles S Henry + 1 more

Acoustic niche partitioning in two cryptic sibling species of Chrysoperla green lacewings that must duet before mating

  • Research Article
  • Cite Count Icon 35
  • 10.11606/issn.2316-9079.v7i2p127-142
Habitat heterogeneity and use of physical and acoustic space in anuran communities in Southeastern Brazil
  • Dec 1, 2008
  • Phyllomedusa: Journal of Herpetology
  • Tiago Da Silveira Vasconcelos + 1 more

We intended to verify if structural and physiognomical characteristics of water bodies influence on the degree of overlap among calling sites of 23 anurans species, if anuran species use different calling sites in different water bodies, and if there is some relationship between the degree of advertisement call (based on seven call features) and calling site differentiation. Then, we determined calling sites (based in four variables) and recorded the advertisement call for anuran species that occurred in 10 water bodies of northwestern São Paulo State. We also determined the environmental heterogeneity (based in four environmental descriptors) for each water body. Males of most species used similar calling sites in each water body, probably because of the high uniformity of the environment, as a consequence of agricultural impacts on edge vegetation of the studied ponds. Most species (18 out of 19 species) called from different sites in the ponds where they occurred, which can be associated with differences in horizontal and vertical distribution of vegetation in the studied ponds. From the 19 species analyzed, only males of Pseudopaludicola aff. saltica called in sites with the same characteristics in different ponds. Advertisement call of Hylidae species was more similar to each other than were Leiuperidae and Leptodactylidae among themselves. The aquatic/terrestrial anurans (Bufonidae, Leiuperidae, Leptodactylidae and Microhylidae) occupied similar calling sites but presented quite distinct advertisement calls, while Hylidae species presented an inverse pattern: a high similarity on advertisement call features but used different calling sites, which indicates a niche complementarity between physical (calling site use) and acoustic (advertisement call) space use.

  • Research Article
  • Cite Count Icon 32
  • 10.1044/1092-4388(2001/098)
Covariation of cochlear implant users' perception and production of vowel contrasts and their identification by listeners with normal hearing.
  • Dec 1, 2001
  • Journal of Speech, Language, and Hearing Research
  • Jennell C Vick + 5 more

This study investigates covariation of perception and production of vowel contrasts in speakers who use cochlear implants and identification of those contrasts by listeners with normal hearing. Formant measures were made of seven vowel pairs whose members are neighboring in acoustic space. The vowels were produced in carrier phrases by 8 postlingually deafened adults, before and after they received their cochlear implants (CI). Improvements in a speaker's production and perception of a given vowel contrast and normally hearing listeners' identification of that contrast in masking noise tended to occur together. Specifically, speakers who produced vowel pairs with reduced contrast in the pre-CI condition (measured by separation in the acoustic vowel space) and who showed improvement in their perception of these contrasts post-CI (measured with a phoneme identification test) were found to have enhanced production contrasts post-CI in many cases. These enhanced production contrasts were associated, in turn, with enhanced masked word recognition, as measured from responses of a group of 10 normally hearing listeners. The results support the view that restoring self-hearing allows a speaker to adjust articulatory routines to ensure sufficient perceptual contrast for listeners.

  • Research Article
  • 10.1121/1.4743680
Speech perception, production, and intelligibility improvements in vowel-pair contrasts among adults who receive cochlear implants
  • Nov 1, 2000
  • The Journal of the Acoustical Society of America
  • Jennell Vick + 5 more

This study investigates relations among speech perception, speech production, and intelligibility in postlingually deaf adults who receive cochlear implants (CI). Measures were made for seven vowel pairs that neighbor in acoustic space from eight postlingually deafened adults, pre- and postimplant. Improvements in a speaker’s production, perception, and intelligibility of a given vowel contrast tended to occur together. Subjects who produced vowel pairs with reduced contrast in the preimplant condition (measured by separation in the acoustic vowel space) and who showed improvement in their perception of these contrasts postimplant (measured with a phoneme identification test) were found to have improved production contrasts post-CI. These enhanced production contrasts were associated with enhanced intelligibility, as measured from responses of a group of normal-hearing listeners. The results support the hypothesis that the implant user’s improving speech perception contributes, at least in part, to that speaker’s improving speech production. [Work supported by the NIDCD, NIH.]

  • Research Article
  • Cite Count Icon 46
  • 10.3109/02699206.2015.1012301
Impact of the LSVT on vowel articulation and coarticulation in Parkinson’s disease
  • Feb 17, 2015
  • Clinical Linguistics & Phonetics
  • Vincent Martel Sauvageau + 3 more

The purpose of this study was to investigate the impact of the Lee Silverman Voice Treatment (LSVT®) on vowel articulation and consonant–vowel (C–V) coarticulation in dysarthric speakers with Parkinson’s disease (PD). Nine Quebec French speakers diagnosed with idiopathic PD underwent the LSVT®. Speech characteristics were compared before and after treatment. Vowel articulation was measured using acoustic vowel space and calculated with the first (F1) and second formant (F2) of the vowels /i/, /u/ and /a/. C–V coarticulation was measured using locus equations, an acoustic metric based on the F2 transitions within vowels in relation to the preceding consonant. The relationship between these variables, speech loudness and vowel duration was also analysed. Results showed that vowel contrast increased in F1/F2 acoustic space after administration of the LSVT®. This improvement was associated with the gain in speech loudness and longer vowel duration. C–V coarticulation patterns between consonant contexts showed greater distinctiveness after the treatment. This improvement was associated with the gain in speech loudness only. These results support the conclusions of previous studies investigating the relationship between the LSVT®, speech loudness and articulation in PD. These results expand clinical understanding of the treatment and indicate that loud speech changes C–V coarticulation patterns. Clinical applications and theoretical considerations are discussed.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.

Search IconWhat is the difference between bacteria and viruses?
Open In New Tab Icon
Search IconWhat is the function of the immune system?
Open In New Tab Icon
Search IconCan diabetes be passed down from one generation to the next?
Open In New Tab Icon