Abstract

The multi-resolution common fate transform (MCFT) is an audio signal representation useful for representing mixtures of multiple audio signals that overlap in both time and frequency. The MCFT combines the invertibility of a state-of-the-art representation, the common fate transform (CFT), and the multi-resolution property of the cortical stage output of an auditory model. Since the MCFT is computed based on a fully invertible complex time-frequency representation, separation of audio sources with high time-frequency overlap may be performed directly in the MCFT domain, where there is less overlap between sources than in the time-frequency domain. The MCFT circumvents the resolution issue of the CFT by using a multi-resolution two-dimensional (2D) filter bank instead of fixed-size 2D windows. This enables higher quality separation without the need to hand-tune the window size to the specific case. In this work, we describe the MCFT, discuss the properties of the MCFT with the aid of illustrative examples, and provide definitions and objective measures for two desirable representation properties: separability of source signals and <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">clusterability</i> of components of each signal. The utility of the MCFT for source separation is illustrated by performing ideal masking on a comprehensive dataset of audio mixtures of musical tones played in unison, including audio samples from a wide pitch range and a variety of instruments/playing techniques. Results show that the ideal masks made in the MCFT domain yield better separability than those made in commonly used time-frequency signal representations as well as the CFT. The use of the MCFT also results in more reliable clusterability than the CFT in most cases.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.