Abstract

Blind Source Separation (BSS) application is a delinquent issue in a complex reverberant environment with changing room geometric dimensions and an increasing number of speech sources. The BSS application issue is determined by the independent component analysis that usually manipulates higher-order statistical approaches. However, the permutation between desired speech sources remains a challenging issue for BSS applications. The permutation problem is been rectified by Independent Vector Analysis (IVA) for BSS applications in the frequency domain. The performance dependency of the IVA approach solely relies on the selection of appropriate source-prior to preserve the inter-frequency dependencies between the same speech source amongst different frequency bins. Therefore, a hybrid model for the IVA method is presented, which comprises of multivariate generalized Gaussian and super-Gaussian distribution source priors to model low as well as high amplitudes speech signals. The weights of the hybrid model between multivariate Gaussian and generalized Gaussian are assigned in accordance to the energy of the observed non-stationary speech mixture signal. In the simulations, different speech mixtures are generated from various speech sources by simulated room model. The proposed approach evaluates the blind separation performance in terms of signal-to-distortion ratio (SDR) and is compared with well-known BSS methods. The results show an improvement of the proposed methodology for non-stationary speech signals over the state-of-the-art IVA models having a fixed source prior.

Highlights

  • Human listeners show the ability to separate the desired speech signal from complex auditory speech mixture, i.e. cocktail party environment [1]

  • Three different mixtures scenarios are evaluated for the proposed hybrid model and [8], [30], [33] i.e. male-male, male-female, and female-female speech mixtures

  • The Blind Source Separation (BSS) separation performance evaluation is based on signal to distortion ratio (SDR) in dB and signal-to-distortion ratio (SDR) is defined as the difference between desired SDR and speech mixture SDR. i.e. SDR = SDRdesired - SDRmixture

Read more

Summary

INTRODUCTION

Human listeners show the ability to separate the desired speech signal from complex auditory speech mixture, i.e. cocktail party environment [1]. The heavy tailed nature of Student’s T distribution enhances the separation performance significantly [30] Mixed source prior such as, MGD and multivariate Student’s T distribution are introduced to model the non-stationary nature of the observed mixture signal. The state-of-the-art IVA method use only fixed source prior distribution models in the separation process of the BSS applications This cannot better model the non-stationary nature of the observed speech mixture, resulting in the degradation of separation performance of the BSS application in real-time environment. In this research work, a hybrid energy-based source prior is proposed, which enhances the robustness of the speech processing applications by adopting the IVA algorithm in accordance to the non-stationary nature of observed speech mixture. MULTIVARIATE SOURCE PRIORS The speech signal dependencies in different frequency bins can be modeled by the probability density function (pdf). In the presented research work, the performance of BSS is improved by a hybrid energy-driven model having multivariate generalized Gaussian and the original multivariate super-Gaussian source priors instead of identical source prior

PROPOSED HYBRID SOURCE PRIOR MODEL
Etot lb
OBJECTIVE EVALUATION
PERFORMANCE EVALUATION
CONCLUSION AND FUTURE WORK
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call