Abstract

The widely used method to estimate glottal source waveform and vocal tract shape is to process speech signal using inverse filter and then to fit residual signal using glottal source model. However, since source-tract interactions, estimation accuracy is reduced. In this paper, we propose a method to estimate glottal source waveform and vocal tract shape simultaneously based on analysis-by-synthesis approach with a source-filter model constructed with an auto-regressive eXogenous (ARX) model combined with the Lilijencrant-Fant (LF) model. Since the optimization of multiple parameters makes simultaneous estimation difficult, there are two steps: the glottal source parameters are initialized using the inverse filter method, then the accurate parameters of the glottal source and the vocal tract shape are estimated simultaneously using an analysis-by-synthesis approach. Experimental results with synthetic and real speech signals showed the higher estimation accuracy of the proposed method than inverse filter.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.