Abstract
Formant tracking is investigated in this study by using trackers based on dynamic programming (DP) and deep neural nets (DNNs). Using the DP approach, six formant estimation methods were first compared. The six methods include linear prediction (LP) algorithms, weighted LP algorithms and the recently developed quasi-closed phase forward-backward (QCP-FB) method. QCP-FB gave the best performance in the comparison. Therefore, a novel formant tracking approach, which combines benefits of deep learning and signal processing based on QCP-FB, was proposed. In this approach, the formants predicted by a DNN-based tracker from a speech frame are refined using the peaks of the all-pole spectrum computed by QCP-FB from the same frame. Results show that the proposed DNN-based tracker performed better both in detection rate and estimation error for the lowest three formants compared to reference formant trackers. Compared to the popular Wavesurfer, for example, the proposed tracker gave a reduction of 29%, 48% and 35% in the estimation error for the lowest three formants, respectively.
Highlights
Estimation and tracking of formant frequencies is an important research topic in several areas of speech science and technology [1]–[6]
These results demonstrate that the quasi-closed phase (QCP)-FBCOV method can be a good replacement for the popularly used linear prediction (LP)-COV analysis in formant estimation and tracking tools and applications
Two most potential all-pole modeling methods (LP-FBCOV and QCP-FBCOV) were used with a modern deep neural nets (DNNs)-based tracker by proposing a novel formant tracking technique which combines benefits of datadriven and model-driven approaches: the formants predicted with the data-driven DNN were refined using the frequencies of the peaks in the all-pole spectra computed by the modeldriven LP-FBCOV and QCP-FBCOV methods
Summary
Estimation and tracking of formant frequencies is an important research topic in several areas of speech science and technology [1]–[6]. QCP analysis exploits the WLP framework of sample selective prediction by designing a temporal weighting function that gives more emphasis on closed phase regions and deemphasizes the open phase as well as the region immediately after the main excitation This results in more accurate closed phase estimates of the vocal tract system with a reduced influence from the glottal source. A. DP-BASED FORMANT TRACKERS Using the DP-based tracking algorithm proposed in [8], formant tracking performance was investigated by comparing six different formant estimation methods that all use all-pole modeling. The FEE values in conjunction with FDR values give a better sense of the performance of a formant tracker
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.