Vocal Tract Characteristics Research Articles

In this paper, the explicit and implicit modelling of the subsegmental excitation information are experimentally compared. For explicit modelling, the static and dynamic values of the standard Liljencrants–Fant (LF) parameters that model the glottal flow derivative (GFD) are used. A simplified approximation method is proposed to compute these LF parameters by locating the glottal closing and opening instants. The proposed approach significantly reduces the computation needed to implement the LF model. For implicit modelling, linear prediction (LP) residual samples considered in blocks of 5 ms with shift of 2.5 ms are used. Different speaker recognition studies are performed using NIST-99 and NIST-03 databases. In case of speaker identification, the implicit modelling provides significantly better performance compared to explicit modelling. Alternatively, the explicit modelling seem to be providing better performance in case of speaker verification. This indicates that explicit modelling seem to have relatively less intra and inter-speaker variability. The implicit modelling on the other hand, has more intra and inter-speaker variability. What is desirable is less intra and more inter-speaker variability. Therefore, for speaker verification task explicit modelling may be used and for speaker identification task implicit modelling may be used. Further, for both speaker identification and verification tasks the explicit modelling provides relatively more complimentary information to the state-of-the-art vocal tract features. The contribution of the explicit features is relatively more robust against noise. We suggest that the explicit approach can be used to model the subsegmental excitation information for speaker recognition.

Read full abstract

We propose a pitch synchronous approach to design the voice conversion system taking into account the correlation between the excitation signal and vocal tract system characteristics of speech production mechanism. The glottal closure instants (GCIs) also known as epochs are used as anchor points for analysis and synthesis of the speech signal. The Gaussian mixture model (GMM) is considered to be the state-of-art method for vocal tract modification in a voice conversion framework. However, the GMM based models generate overly-smooth utterances and need to be tuned according to the amount of available training data. In this paper, we propose the support vector machine multi-regressor (M-SVR) based model that requires less tuning parameters to capture a mapping function between the vocal tract characteristics of the source and the target speaker. The prosodic features are modified using epoch based method and compared with the baseline pitch synchronous overlap and add (PSOLA) based method for pitch and time scale modification. The linear prediction residual (LP residual) signal corresponding to each frame of the converted vocal tract transfer function is selected from the target residual codebook using a modified cost function. The cost function is calculated based on mapped vocal tract transfer function and its dynamics along with minimum residual phase, pitch period and energy differences with the codebook entries. The LP residual signal corresponding to the target speaker is generated by concatenating the selected frame and its previous frame so as to retain the maximum information around the GCIs. The proposed system is also tested using GMM based model for vocal tract modification. The average mean opinion score (MOS) and ABX test results are 3.95 and 85 for GMM based system and 3.98 and 86 for the M-SVR based system respectively. The subjective and objective evaluation results suggest that the proposed M-SVR based model for vocal tract modification combined with modified residual selection and epoch based model for prosody modification can provide a good quality synthesized target output. The results also suggest that the proposed integrated system performs slightly better than the GMM based baseline system designed using either epoch based or PSOLA based model for prosody modification.

Read full abstract

Vocal Tract Characteristics Research Articles

Related Topics

Articles published on Vocal Tract Characteristics

Voice conversion based on Gaussian processes by coherent and asymmetric training with limited training data

Individual identification of Japanese macaques by coo-calls: Pitch or vocal tract characteristics?

A comparative study of explicit and implicit modelling of subsegmental speaker-specific excitation source information

Acoustic discrimination of healthy swallows from upper airway movements

Multi-subject atlas built from structural tongue magnetic resonance images

Compensation for vocal tract characteristics across native and non-native languages

Improvement of Electrolaryngeal Speech Quality Using a Supraglottal Voice Source With Compensation of Vocal Tract Characteristics

An Investigation of Vocal Tract Characteristics for Acoustic Discrimination of Pathological Voices

The effects of prior access to talker information on vowel identification in single- and mixed-talker contexts

Comparing ANN and GMM in a voice conversion framework

A pitch synchronous approach to design voice conversion system using source-filter correlation

Average framing linear prediction coding with wavelet transform for text-independent speaker identification system

The role of vocal-tract characteristics and pitch patterns in identifying individuality in Coo calls of Japanese macaques

Development of vocal tract and acoustic features in children.

Development of vocal tract and acoustic features in children

Listening to different speakers: On the time-course of perceptual compensation for vocal-tract characteristics

A test of formant frequency analyzes with simulated child-like vowels.

Robust Speaker Recognition Using Denoised Vocal Source and Vocal Tract Features

Two stage emotion recognition based on speaking rate

Linear predictive analysis for ultrasonic speech

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Vocal Tract Characteristics Research Articles

Related Topics

Articles published on Vocal Tract Characteristics

Voice conversion based on Gaussian processes by coherent and asymmetric training with limited training data

Individual identification of Japanese macaques by coo-calls: Pitch or vocal tract characteristics?

A comparative study of explicit and implicit modelling of subsegmental speaker-specific excitation source information

Acoustic discrimination of healthy swallows from upper airway movements

Multi-subject atlas built from structural tongue magnetic resonance images

Compensation for vocal tract characteristics across native and non-native languages

Improvement of Electrolaryngeal Speech Quality Using a Supraglottal Voice Source With Compensation of Vocal Tract Characteristics

An Investigation of Vocal Tract Characteristics for Acoustic Discrimination of Pathological Voices

The effects of prior access to talker information on vowel identification in single- and mixed-talker contexts

Comparing ANN and GMM in a voice conversion framework

A pitch synchronous approach to design voice conversion system using source-filter correlation

Average framing linear prediction coding with wavelet transform for text-independent speaker identification system

The role of vocal-tract characteristics and pitch patterns in identifying individuality in Coo calls of Japanese macaques

Development of vocal tract and acoustic features in children.

Development of vocal tract and acoustic features in children

Listening to different speakers: On the time-course of perceptual compensation for vocal-tract characteristics

A test of formant frequency analyzes with simulated child-like vowels.

Robust Speaker Recognition Using Denoised Vocal Source and Vocal Tract Features

Two stage emotion recognition based on speaking rate

Linear predictive analysis for ultrasonic speech