Abstract

We address the problem of separating a monaural mixture of harmonic sounds into the audio signals of individual semitones in an unsupervised manner. Unsupervised monaural audio source separation has thus far been mainly addressed by two approaches: one rooted in computational auditory scene analysis (CASA) and the other based on non-negative matrix factorization (NMF). These approaches focus on different clues for making source separation possible. A CASA-based method called harmonic-temporal clustering (HTC) focuses on a local time-frequency structure of individual sources, whereas NMF focuses on a global time-frequency structure of music spectrograms. These clues do not conflict with each other and can be used to achieve a more reliable audio source separation algorithm. Hence, we propose a monaural audio source separation framework, harmonic-temporal factor decomposition (HTFD), by developing a spectrogram model that encompasses the features of the models used in the NMF and HTC approaches. We further incorporate a source-filter model to build an extension of HTFD, source-filter HTFD (SF-HTFD). We derive efficient parameter estimation algorithms of HTFD and SF-HTFD based on the auxiliary function principle. We show, through music source separation experiments, the efficacy of HTFD and SF-HTFD compared with conventional methods. Furthermore, we demonstrate the effectiveness of HTFD and SF-HTFD for automatic musical key transposition.

Highlights

  • A UDIO source separation, a technique of separating a mixture audio signal into individual source signals, has a wide variety of applications, including automatic music transcription and music editing/remixing

  • We developed an automatic musical key transposition system, which works by performing the following four steps. (i) Given a music audio signal and its key, we first separated the magnitude spectrogram of the input signal into that associated with each semitone by using harmonic-temporal factor decomposition (HTFD), source-filter HTFD (SF-HTFD), or harmonic NMF (HNMF). (ii) We selected the semitones to be transposed according to the source and target key and shifted only the separated magnitude spectrograms corresponding to the selected semitones in the log-frequency direction

  • The spectrogram model of HTFD concurrently offers the advantages of the harmonic-temporal clustering (HTC) and negative matrix factorization (NMF) models, in which the regularities underlying both the local and global time-frequency structures of music spectrograms are exploited

Read more

Summary

INTRODUCTION

A UDIO source separation, a technique of separating a mixture audio signal into individual source signals, has a wide variety of applications, including automatic music transcription and music editing/remixing. By developing a spectrogram model that encompasses the features of the models used in the HTC and NMF approaches, we propose a monaural audio source separation framework, which we call harmonic-temporal factor decomposition (HTFD). Since the CWT is not an orthogonal transform, using the excitation-filter product representation in the CWT domain is not well justified To overcome this issue, we derive an explicit parameter relationship between the HTFD spectrogram model and a sourcefilter model defined in the discrete time domain, following the idea described in [19]. We derive an explicit parameter relationship between the HTFD spectrogram model and a sourcefilter model defined in the discrete time domain, following the idea described in [19] This relationship allows us to model spectral changes associated with pitch and timbre separately and to reveal the underlying meanings of the excitation-filter product representation in the CWT domain. We define the proposed models in the magnitude spectrogram domain instead of the power spectrogram since we have found it to enhance the separation performance

Continuous Wavelet Transform of Source Signal Model
Parameter Decomposition Into Time-Dependent and Time-Independent Factors
Probability Distribution of Observed Spectrogram
Relation to NMF and HTC Models
DESIGN OF PRIOR DISTRIBUTIONS OF HTFD
Maximum a Posterior Estimation Problem
Auxiliary Function Principle
Update Rules
Parameter Relationship Between Source-Filter Model and HTFD Spectrogram Model
Generative Model of SF-HTFD
PARAMETER OPTIMIZATION ALGORITHM OF SF-HTFD
Separation of Harmonic Sounds With Time-Varying F0s
Source-Filter Model Representation in CWT Domain
Unsupervised Monaural Audio Source Separation
Data Preparation and Separation Procedure
HTFD Experiments
SF-HTFD Experiments
Demonstration on Automatic Musical Key Transposition
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.