Vocal Tract Response Research Articles

The fundamental frequency feature is essential for Automatic Speech Recognition because its patterns convey a paralanguage and its tuning normalizes other speech features. Human speech is multidimensional because it is minimally represented by three variables: the intonation (or pitch), the formants (or timbre), and the speech resolution (or depth). These variables represent the hidden states of the local glottal variation, the vocal tract response, and the frequency scale, respectively. Computing them one by one is not as efficient as computing them together, so this article introduces a new speech feature extraction approach. The article is introductory; it focuses on the basic concepts of our new approach and does not elaborate on all applications. It demonstrates that the unit of a cepstral value, which is a spectral value of spectrums, is a unit of acceleration since its discrete variable, the quefrency, can be expressed in Hertz-per-microsecond. The article shows how to produce refined voice analysis from robust estimates and how to reconstruct speech signals from feature spaces. And it concludes that the pitch track of the new approach is as good as two open-source pitch extractors. Combining multiple processes, attenuating background noises, and enabling distant-speech recognition, we introduce the Speech Quefrency Transform (SQT) approach as well as multiple quefrency scales. SQT is a set of frequency transforms whose spectral leakages are controlled per a frequency-modulation model. SQT captures the stationarity of time series onto a hyperspace that resembles the cepstrogram when it is reduced for pitch track extraction.

Read full abstract

Human speech consists mainly of three components: a glottal signal, a vocal tract response, and a harmonic shift. The three respectively correlate with the intonation (pitch), the formants (timbre), and the speech resolution (depth). Adding the intonation of the Fundamental Frequency (FF) to Automatic Speech Recognition (ASR) systems is necessary. First, the intonation conveys a primitive paralanguage. Second, its speaker-tuning reduces background noises to clarify acoustic observations. Third, extracting the speech features is more efficient when they are computed together at the same time. This work introduces a frequency-modulation model, a novel quefrency-based speech feature extraction that is named Speech Quefrency Transform (SQT), and its proper quefrency scaling and transformation function. The cepstrums, which are spectrums of spectrums, are suggested in time unit accelerations, whereby the discrete variable, the quefrency, is measured in Hertz-per-microsecond. The extracted features are comparable to Mel-Frequency Cepstral Coefficients (MFCC) integrated within a quefrency-based pitch tracker. The SQT transform directly expands time samples of stationary signals (i.e., speech) to a higher dimensional space, which can help generative Artificial Neural Networks (ANNs) in unsupervised Machine Learning and Natural Language Processing (NLP) tasks. The proposed methodologies, which are a scalable solution that is compatible with dynamic and parallel programming for refined speech and cepstral analysis, can robustly estimate the features after applying a matrix multiplication in less than a hundred sub-bands, preserving precious computational resources.

Read full abstract

Vocal Tract Response Research Articles

Related Topics

Articles published on Vocal Tract Response

Multi-Dimensional Spectral Process for Cepstral Feature Engineering & Formant Coding

A Robust Speech Features Extractor & Reconstructor For Artificial Intelligence Frontends

Effects of Nasalization on Vocal Tract Response Curve

High Security and Capacity of Image Steganography for Hiding Human Speech Based on Spatial and Cepstral Domains

Demodulation of Narrowband Speech Spectrograms Using the Riesz Transform

Algorithm for Cepstral Analysis and Homomorphic Filtering for Glottal Feature Estimation in Speech Signals

Glottal source processing: From analysis to applications

Cepstral and linear prediction techniques for improving intelligibility and audibility of impaired speech

Voice Production Mechanisms of Vocal Vibrato in Male Singers

Effect of the glottal source and the vocal tract on the partials amplitude of vibrato in male voices

Vocal-tract computation: How to make it robust and fast

Vocal tract response to toxic injury: Clinical issues

Glottal inverse filtering by joint estimation of an AR system with a linear input model

Automatic glottal inverse filtering from speech and electroglottographic signals

Formant frequencies, bandwidths, and Qs in helium speech.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Vocal Tract Response Research Articles

Related Topics

Articles published on Vocal Tract Response

Multi-Dimensional Spectral Process for Cepstral Feature Engineering &amp; Formant Coding

A Robust Speech Features Extractor &amp; Reconstructor For Artificial Intelligence Frontends

Effects of Nasalization on Vocal Tract Response Curve

High Security and Capacity of Image Steganography for Hiding Human Speech Based on Spatial and Cepstral Domains

Demodulation of Narrowband Speech Spectrograms Using the Riesz Transform

Algorithm for Cepstral Analysis and Homomorphic Filtering for Glottal Feature Estimation in Speech Signals

Glottal source processing: From analysis to applications

Cepstral and linear prediction techniques for improving intelligibility and audibility of impaired speech

Voice Production Mechanisms of Vocal Vibrato in Male Singers

Effect of the glottal source and the vocal tract on the partials amplitude of vibrato in male voices

Vocal-tract computation: How to make it robust and fast

Vocal tract response to toxic injury: Clinical issues

Glottal inverse filtering by joint estimation of an AR system with a linear input model

Automatic glottal inverse filtering from speech and electroglottographic signals

Formant frequencies, bandwidths, and Qs in helium speech.

Multi-Dimensional Spectral Process for Cepstral Feature Engineering & Formant Coding

A Robust Speech Features Extractor & Reconstructor For Artificial Intelligence Frontends