Abstract

This paper proposes harmonic vector analysis (HVA) based on a general algorithmic framework of audio blind source separation (BSS) that is also presented in this paper. BSS for a convolutive audio mixture is usually performed by multichannel linear filtering when the numbers of microphones and sources are equal (determined situation). This paper addresses such determined BSS based on batch processing. To estimate the demixing filters, effective modeling of the source signals is important. One successful example is independent vector analysis (IVA) that models the signals via co-occurrence among the frequency components in each source. To give more freedom to the source modeling, a general framework of determined BSS is presented in this paper. It is based on the plug-and-play scheme using a primal-dual splitting algorithm and enables us to model the source signals implicitly through a time-frequency mask. By using the proposed framework, determined BSS algorithms can be developed by designing masks that enhance the source signals. As an example of its application, we propose HVA by defining a time-frequency mask that enhances the harmonic structure of audio signals via sparsity of cepstrum. The experiments showed that HVA outperforms IVA and independent low-rank matrix analysis (ILRMA) for both speech and music signals. A MATLAB code is provided along with the paper for a reference.

Highlights

  • B LIND source separation (BSS) is a methodology to recover the source signals from multiple mixtures without any knowledge about the mixing system

  • Tcλo,sκ is a smooth approximation of the hard-thresholding operator as on the left side of Fig. 3. The reasons why this shrinkage operator is adopted in harmonic vector analysis (HVA) are as follows: (1) it has no bias for large coefficients similar to hard-thresholding; (2) we found that smoothness is important for stable separation; and (3) HVA does not require to force small coefficients to be exactly zero owing

  • It can be seen that the proposed HVA was comparable to the other methods

Read more

Summary

Introduction

B LIND source separation (BSS) is a methodology to recover the source signals from multiple mixtures (audio recordings in the case of this paper) without any knowledge about the mixing system. Let a convolutive mixing process of the signals be approximated in the time-frequency domain as mixing matrix, and t = 1, . The aim of BSS is to recover the unknown source signals, s, only from the mixtures, x. In the determined (M = N ) or overdetermined (M > N ) situation, the usual strategy for solving the BSS problem is to formulate an estimation problem of finding (or approximating) a demixing matrix, W[f ] ∈ CN×M , that is a left inverse of A[f ] (i.e., W[f ]A[f ] = I, where I is the identity matrix). The source signals are recovered by multiplication of the estimated demixing matrix as follows: W[f ]x[t, f ] ≈ W[f ]A[f ]s[t, f ] = s[t, f ]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.