The objective of binaural multi-microphone speech enhancement algorithms can be viewed as a multi-criteria design problem as there are several requirements to be met. The objective is not only to extract the target speaker without distortion, but also to suppress interfering sources (e.g., competing speakers) and ambient background noise, while preserving the auditory impression of the complete acoustic scene. Such a multi-objective problem (MOP) can be solved using a Pareto frontier, which provides a useful trade-off between the different criteria. In this paper, we propose a unified Pareto optimization framework, which is achieved by defining a generalized mean squared error (MSE) cost function, derived from a MOP. The solution to the multi-criteria problem is grounded on a solid mathematical foundation. The MSE cost function consists of a weighted sum of speech distortion (SD), partial interference reduction (IR), and partial noise reduction (NR) terms with scaling parameters that control the amount of IR and NR. The filter minimizing this generalized cost function, denoted Pareto optimal binaural multichannel Wiener filter (Pareto-BMWF), constitutes a generalization of various binaural MWF-based and binaural MVDR-based beamformers. This solution is optimal for any set of parameters. The improved speech enhancement capabilities are experimentally demonstrated using real-signal recordings when estimation errors are present and the binaural cue preservation capabilities are analyzed.
Read full abstract