Unvoiced Frames Research Articles

Sound symbolism refers to non-arbitrary mappings between the sounds of words and their meanings and is often studied by pairing auditory pseudowords such as "maluma" and "takete" with rounded and pointed visual shapes, respectively. However, it is unclear what auditory properties of pseudowords contribute to their perception as rounded or pointed. Here, we compared perceptual ratings of the roundedness/pointedness of large sets of pseudowords and shapes to their acoustic and visual properties using a novel application of representational similarity analysis (RSA). Representational dissimilarity matrices (RDMs) of the auditory and visual ratings of roundedness/pointedness were significantly correlated crossmodally. The auditory perceptual RDM correlated significantly with RDMs of spectral tilt, the temporal fast Fourier transform (FFT), and the speech envelope. Conventional correlational analyses showed that ratings of pseudowords transitioned from rounded to pointed as vocal roughness (as measured by the harmonics-to-noise ratio, pulse number, fraction of unvoiced frames, mean autocorrelation, shimmer, and jitter) increased. The visual perceptual RDM correlated significantly with RDMs of global indices of visual shape (the simple matching coefficient, image silhouette, image outlines, and Jaccard distance). Crossmodally, the RDMs of the auditory spectral parameters correlated weakly but significantly with those of the global indices of visual shape. Our work establishes the utility of RSA for analysis of large stimulus sets and offers novel insights into the stimulus parameters underlying sound symbolism, showing that sound-to-shape mapping is driven by acoustic properties of pseudowords and suggesting audiovisual cross-modal correspondence as a basis for language users' sensitivity to this type of sound symbolism.

Read full abstract

In this paper, we propose a variable-bit-rate speech codec-based on mixed excitation linear prediction enhanced (MELPe) with an average bit rate of 2 kbps and with a better representation of excitation signal. The order of the prediction filter in MELPe coding architecture is reduced from 10 to 7 without affecting the perceptual quality of the decoded speech by using psychoacoustic Mel scale. An efficient two-split vector quantization is developed with weighted Euclidean distance measure for Mel scale-based linear predictive coding (Mel-LPC), and it requires only 18 bits/frame. The instantaneous pitch or epoch that is vital for many speech processing applications is preserved in this codec by including it in the excitation signal used for reconstructing the voiced speech. The quantization scheme developed for glottal closure instants (GCIs) causes an increase in the bit requirement for voiced frames by 4–25 bits depending on the position of GCIs. To compensate for that, the Mel-LPC order for both silence and unvoiced frames has been brought down to 4 without compromising the perceptual quality of reconstructed speech. The lowered bit budget for unvoiced frame is 41 bits/frame, and for silence, it is 31 bits/frame. Further reduction of 10 bits for silence frame is obtained by reducing the number of transmitted parameters and by tuning the quantization bit requirement for each. For categorizing the speech frames at the entry of the encoder, a neural network-based voiced/unvoiced/silence classification algorithm using five-dimensional feature set is created. The experimental results show that the proposed coding scheme operates at an average bit rate of 2 kbps, which is less than the bit rate of MELPe (2.4 kbps), but with a better perceptual score. In addition to all these, the incorporation of Mel-LPC gives a better performance in the estimation of formants and GCIs.

Read full abstract

Unvoiced Frames Research Articles

Related Topics

Articles published on Unvoiced Frames

Alaryngeal Speech Enhancement for Noisy Environments Using a Pareto Denoising Gated LSTM

"Lombard Effect" and Voice Changes in Adductor Laryngeal Dystonia: A Pilot Study.

Investigating temporal and prosodic markers in clinical high-risk for psychosis participants using automated acoustic analysis.

Voice Activity Detection: Fusion of Time and Frequency Domain Features with A SVM Classifier

More on Sibilant Devoicing in Spanish Diachrony: An Initial Phonetic Approach

Association of Bio-acoustic Features of Vocal Signals with Age and Semen Quality in Sahiwal Bulls

Extraction of Voiced Regions of Speech from Emotional Speech Signals Using Wavelet-Pitch Method

Mel Scale-Based Linear Prediction Approach to Reduce the Prediction Filter Order in CELP Paradigm

Stimulus Parameters Underlying Sound-Symbolic Mapping of Auditory Pseudowords to Visual Shapes.

Design of MELPe-Based Variable-Bit-Rate Speech Coding with Mel Scale Approach Using Low-Order Linear Prediction Filter and Representing Excitation Signal Using Glottal Closure Instants

Performance evaluation of psycho-acoustically motivated front-end compensator for TIMIT phone recognition

Recognition of isolated digits using DNN–HMM and harmonic noise model

Discrimination of karan fries cow’s individuality by the mean of their vocal acoustic features

Multiple Time-Instances Features of Degraded Speech for Single Ended Quality Measurement

Acoustic features of vocalization during different phases of estrous cycle in Murrah buffaloes

Sparse Representations for Single Channel Speech Enhancement Based on Voiced/Unvoiced Classification

A New Biologically Inspired Fuzzy Expert System-Based Voiced/Unvoiced Decision Algorithm for Speech Enhancement

Automatic detection of Parkinson's disease in running speech spoken in three different languages

Bio-acoustic: A non-invasive and effective sensing technique in monitoring of dairy buffaloes

Use of baseband phase structure to improve the performance of current speech enhancement algorithms

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Unvoiced Frames Research Articles

Related Topics

Articles published on Unvoiced Frames

Alaryngeal Speech Enhancement for Noisy Environments Using a Pareto Denoising Gated LSTM

"Lombard Effect" and Voice Changes in Adductor Laryngeal Dystonia: A Pilot Study.

Investigating temporal and prosodic markers in clinical high-risk for psychosis participants using automated acoustic analysis.

Voice Activity Detection: Fusion of Time and Frequency Domain Features with A SVM Classifier

More on Sibilant Devoicing in Spanish Diachrony: An Initial Phonetic Approach

​Association of Bio-acoustic Features of Vocal Signals with Age and Semen Quality in Sahiwal Bulls

Extraction of Voiced Regions of Speech from Emotional Speech Signals Using Wavelet-Pitch Method

Mel Scale-Based Linear Prediction Approach to Reduce the Prediction Filter Order in CELP Paradigm

Stimulus Parameters Underlying Sound-Symbolic Mapping of Auditory Pseudowords to Visual Shapes.

Design of MELPe-Based Variable-Bit-Rate Speech Coding with Mel Scale Approach Using Low-Order Linear Prediction Filter and Representing Excitation Signal Using Glottal Closure Instants

Performance evaluation of psycho-acoustically motivated front-end compensator for TIMIT phone recognition

Recognition of isolated digits using DNN–HMM and harmonic noise model

Discrimination of karan fries cow’s individuality by the mean of their vocal acoustic features

Multiple Time-Instances Features of Degraded Speech for Single Ended Quality Measurement

Acoustic features of vocalization during different phases of estrous cycle in Murrah buffaloes

Sparse Representations for Single Channel Speech Enhancement Based on Voiced/Unvoiced Classification

A New Biologically Inspired Fuzzy Expert System-Based Voiced/Unvoiced Decision Algorithm for Speech Enhancement

Automatic detection of Parkinson's disease in running speech spoken in three different languages

Bio-acoustic: A non-invasive and effective sensing technique in monitoring of dairy buffaloes

Use of baseband phase structure to improve the performance of current speech enhancement algorithms

Association of Bio-acoustic Features of Vocal Signals with Age and Semen Quality in Sahiwal Bulls