Abstract

We propose a new method for speech source separation that is based on directionally-disjoint estimation of the transfer functions between microphones and sources at different frequencies and at multiple times. The spatial transfer functions are estimated from eigenvectors of the microphones' correlation matrix. Smoothing and association of transfer function parameters across different frequencies are performed by simultaneous extended Kalman filtering of the amplitude and phase estimates. This approach allows transfer function estimation even if the number of sources is greater than the number of microphones, and it can operate for both wideband and narrowband sources. The performance of the proposed method was studied via simulations and the results show good performance.

Highlights

  • Many audio communication and entertainment applications deal with acoustic signals that contain combinations of several acoustic sources in a mixture that overlaps in time and frequency

  • We consider a simpler but still practical and largely overlooked situation of mixtures that contain a combination of source signals in weak reverberation environments, such as speech or music recorded with close microphones

  • An additional difficulty occurs for the TF representation: independence between two signals in a certain band around ω corresponds to independence between narrowband processes, which can be revealed at time scales that are significantly longer than the window size or the effective impulse response of the bandpass filter used for TF analysis

Read more

Summary

INTRODUCTION

Many audio communication and entertainment applications deal with acoustic signals that contain combinations of several acoustic sources in a mixture that overlaps in time and frequency. We propose a new method for source separation in the echoic or slightly reverberant case that is based on estimating and clustering the spatial signatures (transfer functions) between the microphones and the sources at different frequencies and at multiple times. The transfer functions for each source-microphone pair are derived from eigenvectors of correlation matrices between the microphone signals at each frequency, and are determined through a selection and clustering process that creates disjoint sets of eigenvector candidates for every frequency at multiple times. The proposed method can be used for approximate signal separation in undercomplete cases (more than two sources in a stereo recording) using filtering or time-frequency masking [8], in a manner similar to that of the W-disjoint situation.

BACKGROUND
PROPOSED SOURCE SEPARATION METHOD
Identification of single-source TF cells
Spatial transfer function estimation
TRACKING AND FREQUENCY ASSOCIATION ALGORITHM
Gaussian mixture model and extended Kalman filter
The separation algorithm
EXPERIMENTAL RESULTS
CONCLUSIONS

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.