This paper describes a method to separate a monaural music signal into harmonic components e.g., a guitar and percussive components, e.g., a snare drum. Separation of these two components is a useful preprocessing for many music information retrieval applications, and in addition, it can be used as a new kind of music equalizer in itself, which enables a music listener to adjust the ratio of the volume of the guitar and the drum freely by themselves. Because of these potential applications, there have been many attempts to develop such a technique, especially in the last decade. However, some of the state-of-the-art techniques have a drawback that they are based on costly operations, such as the multiplications of large-sized matrix, Monte Carlo method, etc., which may constitute barriers to the practical use on some small computers such as smart phones. In this paper, an efficient method that does not depend on these costly operations is described. In formulating the methods, the authors basically assumed only the anisotropic smoothness of music spectrogram, which can be one of the minimalistic model that reflects the natures of these instruments. To be specific, the authors just assumed that harmonic instruments are smooth in time, while the percussive instruments are smooth in frequency on a music spectrogram. In this paper, on the basis of the assumption, source separation methods are formulated as optimization problems that optimize the anisotropic smoothness under some conditions. Because of the simplicity of the model, the derived algorithms are quite simple. Experimental results show that the methods were effective compared to a state-of-the-art technique, and the computation time was much shorter than an existing method; specifically, it can process a three-minute song in around 4-20 seconds on a laptop PC.
Read full abstract