Sound source classification using temporal pattern for mixed sounds

Toru Taniguchi,Mikio Tohyama,Katsuhiko Shirai

doi:10.1121/1.4788163

Abstract

In speech information processing of real‐world environments, how to deal with nonstationary acoustic noise such as background music or speech is a remaining problem while much progress has been made in the speech processing in stationary noise. Music or speech is a sound that has a complex spectral and temporal structure composed of many musical notes or phonemes, and this fact makes it difficult to estimate signals of music or speech separated from a foreground speech. For noise estimation of one‐channel sound, only the prior knowledge of temporal or spectral structures of the noises is available besides the signal itself. Therefore, dealing with nonstationary noise of music or speech essentially requires the models representing the temporal and spectral structure of them. In our study, a classification method of nonstationary sounds using sinusoidal decomposition is proposed. In the method a classification of mixed sounds into sound categories, like speech, singing voices, or instruments, is performed b...

Full Text