Abstract

In this paper, we demonstrate an auditory spectrogram based on a dynamic compressive gammachirp filterbank (GCFB) that enables accurate and robust estimation of vocal tract length (VTL) for both voiced and whispered speech. Normalized VTLs of 21 speakers were derived by using the least squared analysis of their VTL ratios (for all permutations, 420 = 21P20) which were estimated by minimizing spectral distances in the auditory spectrograms. The frequency range was selected in the calculation and the range between 500 and 5000 (Hz) was most reasonable for both speech mode. The method based on GCFB was better than that based on the mel-frequency filterbank (MFFB). The estimated VTLs were compared with the VTL data measured in MRI to confirm the reliability.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call