Abstract

Fast fixed-point independent vector analysis (FastIVA) is an improved independent vector analysis (IVA) method, which can achieve faster and better separation performance than original IVA. As an example IVA method, it is designed to solve the permutation problem in frequency domain independent component analysis by retaining the higher order statistical dependency between frequencies during learning. However, the performance of all IVA methods is limited due to the dimensionality of the parameter space commonly encountered in practical frequency-domain source separation problems and the spherical symmetry assumed with the source model. In this article, a particular permutation problem encountered in using the FastIVA algorithm is highlighted, namely the block permutation problem. Therefore a new audio video based fast fixed-point independent vector analysis algorithm is proposed, which uses video information to provide a smart initialization for the optimization problem. The method cannot only avoid the ill convergence resulting from the block permutation problem but also improve the separation performance even in noisy and high reverberant environments. Different multisource datasets including the real audio video corpus AV16.3 are used to verify the proposed method. For the evaluation of the separation performance on real room recordings, a new pitch based evaluation criterion is also proposed.

Highlights

  • The cocktail party problem was first described by Colin Cherry in 1953 [1]

  • In order to retain the dependency between different frequency bins, one method is the joint blind source separation based on multiset canonical correlation analysis [25], another widely used method is independent vector analysis, which is focused in this article

  • We proposed an AVIVA algorithm which can use the geometric information obtained from video to set a proper initialization

Read more

Summary

Introduction

The cocktail party problem was first described by Colin Cherry in 1953 [1]. Cherry and Taylor [2] further worked on this problem, which is captured by the question: “How do we recognize what one person is saying when others are speaking at the same time (the “cocktail party problem”)?”. In order to reduce the computational cost of the time domain methods, the source separation problems are generally solved in the frequency domain. In order to retain the dependency between different frequency bins, one method is the joint blind source separation based on multiset canonical correlation analysis [25], another widely used method is independent vector analysis, which is focused in this article. It can preserve the higher order statistical dependencies between frequency bins and remove the dependencies between sources [17] It can address the permutation problem during learning without the help of other prior knowledge or post processing. Fast fixed-point independent vector analysis FastIVA is a fast form of IVA algorithm It employs Newton’s method update rules, which converges quadratically and is free from selecting an efficient learning rate.

It is evident that when ke ke
Three sources
Permutation measurement
Iterations Separation Iterations Separation
Findings
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.