Abstract
We model a segment of filtered speech signal as a product of elementary signals as opposed to a sum of sinusoidal signals. Using this model, one can better appreciate the basic relationships between envelopes and phases or instantaneous frequencies (IFs) of signals. These relationships reveal some interesting properties of the signal's modulations. For instance, if the contribution due to a signal's envelope, specifically the Hilbert transform of its log-envelope, is removed from the signal's phase, then the resulting signal's IF is strictly positive. In addition, filtered speech signal having a bandwidth of B Hz can be essentially represented by the log-envelope and IF that have the same B Hz bandwidths. We extend the above ideas to decompose speech into modulated components. Specifically, a bank of data-adaptive filters (in a cross-coupled configuration) are used to decompose speech into its components; each adaptive filter is a simple single resonance bandpass filter (whose center-frequency or pole-location closely follows the desired formant frequency) supplemented by an adaptive all-zero filter (whose zero-locations sufficiently reduce unwanted leakage from neighboring formants). The filtered components are then represented by their respective log-envelopes and positive IFs; these small number of modulations closely approximate the speech signal.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.