The aim of this work is separation of foreground speech from background sound sources using selective remixing of bandpass filtered channels. Clearly, the remixing parameters must be dynamic since the speech and noise spectra are highly non-stationary. Remixing parameters are recomputed at onsets, detected using biologically motivated techniques [L. S. Smith and D. S. Fraser, IEEE TNNS 15, 1125–1134 (2004)]. However, onsets may originate from the foreground or the background. To select appropriate onsets from the foreground source (whose direction is known) a two microphone system is used, selecting onsets for which the estimated direction in that channel corresponds to the foreground direction. Two different techniques for direction estimation are used: a channel by channel short-term autocorrelation technique, and a channel by channel spike based phase synchronous system (SBPSS), computing ITDs [L. S. Smith, in Artificial Neural Networks, Proc ICANN 2001, LNCS 2130, pp. 1103–1108 (Springer, 2001)] and IIDs [L. S. Smith, in From Animals to Animats, Vol. 7, pp. 60–61 (MIT Press, 2002)]. Results comparing the performance of autocorrelation and SBPSS on single source and source plus noise signals in an office environment are presented. [Work supported by UK EPSRC.]
Read full abstract