Sound source separation and synthesis for audio enhancement based on spectral amplitudes of two-channel stereo signals

Masayuki Nishiguchi,Shoichi Takane,Ayumu Morikawa,Kanji Watanabe,Koji Abe

doi:10.1121/1.4971036

Abstract

A sound source separation algorithm based on the spectral amplitudes of 2-channel signals has been developed for the up-mixing playback of 2-channel stereo. Short-term Fourier transforms (STFT) of the signals on the left and right channels are first calculated. The coefficients of the discrete Fourier transform (DFT) are used to calculate the ratio of the spectral amplitudes of the left and right channels, which is termed the channel level difference (CLD). The DFT coefficients are then divided into multiple groups on the basis of the CLD, with each group representing a separated sound source. The signal-to-distortion ratio (SDR)is used to evaluate the signal separation performance. It was found that a rough estimate of the CLD threshold yielding the best SDR could be obtained by cross-correlating the separated sounds. For playback on a headset, each separated signal is convoluted with head-related transfer functions (HRTF) that represent the direction of that particular sound source. Subjective listening tests showed that the sound synthesized by this method is more realistic than that synthesized with HRTFs that represent only left and right speakers.

Full Text