Performance Of Mel Frequency Cepstral Coefficients Research Articles

On the one hand, the relationship between formant frequencies and vocal tract length (VTL) has been intensively studied over the years. On the other hand, the connection involving mel-frequency cepstral coefficients (MFCCs), which concisely codify the overall shape of a speaker’s spectral envelope with just a few cepstral coefficients, and VTL has only been modestly analyzed, being worth of further investigation. Thus, based on different statistical models, this article explores the advantages and disadvantages of the latter approach, which is relatively novel, in contrast to the former which arises from more traditional studies. Additionally, VTL is assumed to be a static and inherent characteristic of speakers, that is, a single length parameter is frequently estimated per speaker. By contrast, in this paper we consider VTL estimation from a dynamic perspective using modern real-time Magnetic Resonance Imaging (rtMRI) to measure VTL in parallel with audio signals. To support the experiments, data obtained from USC-TIMIT magnetic resonance videos were used, allowing for the 2D real-time analysis of articulators in motion. As a result, we observed that the performance of MFCCs in case of speaker-dependent modeling is higher, however, in case of cross-speaker modeling, which uses different speakers’ data for training and evaluating, its performance is not significantly different of that obtained with formants. In complement, we note that the estimation based on MFCCs is robust, with an acceptable computational time complexity, coherent with the traditional approach.

This paper compares Mel-Frequency Cepstral Coefficients (MFCCs) and Linear Prediction Cepstral Coefficients (LPCCs) features under three speaker conditions: waking up, being fully awake and being tired, to determine which is better at handling the effect of these variations. A Gaussian Mixture Model (GMM) Classifier was used for both features. Experimental results show an identification rate of 83.3% in the MFCC based system when the speakers were just waking up, while the LPCC based system had a lower identification rate of 75%. Also, when the speakers were either fully awake or tired, the MFCC based system achieved an identification rate of 100%, while the LPCC based system had an Identification rate of 91.7%. In speaker verification, under the first condition (Waking Up), there is a significant difference between the equal error rates (EER), 7.9% for MFCC and 22.0% for LPCC. Also, there is a significant difference between the total success rates (TSR) under this condition. 82.5% for MFCC and 65.0% for LPCC. Overall, MFCC achieved a better total success rate under the three conditions studied. General Terms Speaker Recognition, intra-speaker variability, session variability.

Performance Of Mel Frequency Cepstral Coefficients Research Articles

Articles published on Performance Of Mel Frequency Cepstral Coefficients

MFCC Parameters of the Speech Signal: An Alternative to Formant-Based Instantaneous Vocal Tract Length Estimation

Comparative Study on the Performance of Mel-Frequency Cepstral Coefficients and Linear Prediction Cepstral Coefficients under different Speaker&apos;s Conditions

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Performance Of Mel Frequency Cepstral Coefficients Research Articles

Articles published on Performance Of Mel Frequency Cepstral Coefficients

MFCC Parameters of the Speech Signal: An Alternative to Formant-Based Instantaneous Vocal Tract Length Estimation

Comparative Study on the Performance of Mel-Frequency Cepstral Coefficients and Linear Prediction Cepstral Coefficients under different Speaker&amp;apos;s Conditions

Comparative Study on the Performance of Mel-Frequency Cepstral Coefficients and Linear Prediction Cepstral Coefficients under different Speaker's Conditions