Formant Tracking Research Articles

This paper presents an experimental evaluation of different features for use in speaker identification. The features are tested using speech data provided by the chains corpus, in a closed-set speaker identification task. The main objective of the paper is to present a novel parametrization of speech that is based on the AM-FM representation of the speech signal and to assess the utility of these features in the context of speaker identification. In order to explore the extent to which different instantaneous frequencies due to the presence of formants and harmonics in the speech signal may predict a speaker's identity, this work evaluates three different decompositions of the speech signal within the same AM-FM framework: a first setup has been used previously for formant tracking, a second setup is designed to enhance familiar resonances below 4000 Hz, and a third setup is designed to approximate the bandwidth scaling of the filters conventionally used in the extraction of Mel-fequency cepstral coefficients (MFCCs). From each of the proposed setups, parameters are extracted and used in a closed-set text-independent speaker identification task. The performance of the new featural representation is compared with results obtained adopting MFCC and RASTA-PLP features in the context of a generic Gaussian mixture model (GMM) classification system. In evaluating the novel features, we look selectively at information for speaker identification contained in the frequency range 0-4000 Hz and 4000-8000 Hz, as the instantaneous frequencies revealed by the AM-FM approach suggest the presence of structures not well known from conventional spectrographic analyses. Accuracy results obtained using the new parametrization perform as well as conventional MFCC parameters within the same reference system, when tested and trained on modally voiced speech which is mismatched in both channel and style. When the testing material is whispered speech, the new parameters provide better results than any of the other features tested, although they remain far from ideal in this limiting case.

Read full abstract

A new formant-tracking algorithm using phoneme information is proposed. Conventional formant-tracking algorithms obtain formant tracks by analyzing the acoustic speech signal using continuity constraints without any additional information. The formant-tracking error rate of the conventional methods is reportedly in the range of 10%-20%. In this paper, we show that if text or phoneme transcription of speech utterances is available, the error rate can be significantly reduced. The basic idea behind this approach is that given the phoneme identity, formant-tracking algorithms can have a better clue of where to look for formants. The algorithm consists of three phases: 1) analysis, 2) segmentation and alignment, and 3) formant tracking by the Viterbi searching algorithm. In the analysis phase, formant candidates are obtained for each analysis frame by solving the linear prediction polynomial. In the segmentation and alignment phase, the text corresponding to the input speech utterance is converted into a sequence of phoneme symbols. Then, the phoneme sequence is time aligned with the speech utterance. A hidden Markov model (HMM) based automatic segmentation algorithm is used for forced-time alignment. For each phoneme segment, nominal formant frequencies are assigned at the center of each phoneme segment. Then nominal formant tracks for the entire utterance are obtained by interpolating the nominal formant frequencies. In order to compensate for the coarticulation effect, different interpolation methods are used depending on the phonemic context. The interpolation process makes the formant-tracking algorithm robust to possible segmentation errors made by the HMM-based segmentation algorithm. As a result, the proposed formant-tracking algorithm does not require highly accurate alignment/segmentation. Finally, a set of formants is chosen from the formant candidates in such a way that the resulting formant tracks come close to the nominal formant tracks while satisfying the continuity constraints. The algorithm is tested using natural speech utterances and the performance is compared against formant tracks obtained by the conventional method using continuity constraints only. The new algorithm significantly reduces the formant-tracking error rate (5.03% for male and 3.73% for female) over the conventional formant-tracking algorithm (13.00% for male and 15.82% for female).

Read full abstract

Formant Tracking Research Articles

Related Topics

Articles published on Formant Tracking

Automated Measurement of Vowel Formants in the Buckeye Corpus

Formant Tracking Linear Prediction Model using HMMs for Noisy Speech Processing

Formant‐frequency trajectories as acoustic correlates to speech perception.

An acoustical comparison of English tense and lax vowels.

Measuring Norwegian dialect distances using acoustic features

Speaker Identification Using Instantaneous Frequencies

Bayesian formant tracking using conditionally linear Gaussian models

Kalman tracking of linear predictor and harmonic noise models for noisy speech enhancement

Analysis and Synthesis of Formant Spaces of British, Australian, and American Accents

Formant tracking linear prediction model using HMMs and Kalman filters for noisy speech processing

Compensation following real-time manipulation of formants in isolated vowels

Initialization, Training, and Context-Dependency in HMM-Based Formant Tracking

Robust Formant Tracking for Continuous Speech With Speaker Variability

A state-space model with neural-network prediction for recovering vocal tract resonances in fluent speech from Mel-cepstral coefficients

Formant tracking using context-dependent phonemic information

Knowledge-based formant tracking with confidence measure using dynamic programming

Turning speech into music in a two-dimensional space by varying the bandwidth and rate of tone pulses placed along formant tracks

Measuring Norwegian Dialect Distances Using Acoustic Features

Formant frequency estimation of high-pitched speech by homomorphic prediction

Objective analysis of the singing voice as a training aid

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Formant Tracking Research Articles

Related Topics

Articles published on Formant Tracking

Automated Measurement of Vowel Formants in the Buckeye Corpus

Formant Tracking Linear Prediction Model using HMMs for Noisy Speech Processing

Formant‐frequency trajectories as acoustic correlates to speech perception.

An acoustical comparison of English tense and lax vowels.

Measuring Norwegian dialect distances using acoustic features

Speaker Identification Using Instantaneous Frequencies

Bayesian formant tracking using conditionally linear Gaussian models

Kalman tracking of linear predictor and harmonic noise models for noisy speech enhancement

Analysis and Synthesis of Formant Spaces of British, Australian, and American Accents

Formant tracking linear prediction model using HMMs and Kalman filters for noisy speech processing

Compensation following real-time manipulation of formants in isolated vowels

Initialization, Training, and Context-Dependency in HMM-Based Formant Tracking

Robust Formant Tracking for Continuous Speech With Speaker Variability

A state-space model with neural-network prediction for recovering vocal tract resonances in fluent speech from Mel-cepstral coefficients

Formant tracking using context-dependent phonemic information

Knowledge-based formant tracking with confidence measure using dynamic programming

Turning speech into music in a two-dimensional space by varying the bandwidth and rate of tone pulses placed along formant tracks

Measuring Norwegian Dialect Distances Using Acoustic Features

Formant frequency estimation of high-pitched speech by homomorphic prediction

Objective analysis of the singing voice as a training aid