Abstract
Event Abstract Back to Event Applications of Non-linear Component Extraction to Spectrogram Representations of Auditory Data. Jorg Bornschein1* and Jorg Lucke1 1 Frankfurt Institute for Advanced Studies, Germany The state-of-the-art in component extraction for many types of data is based on variants of models such as principle component analysis (PCA), independent component analysis (ICA), sparse coding (SC), factor analysis (FA), or non-negative matrix factorization (NMF). These models are linear in the sense that they assume the data to consist of linear super-positions of hidden causes, i.e., these models try to explain the data with linear super-positions of generative fields. This assumption becomes obvious in the generative interpretation of these models [1].For many types of data, the assumption of linear component super-positions represents a good approximation. An example is the super-position of air-pressure waveforms. In contrast, we here study auditory data represented in the frequency domain. We consider data similar to those processed by the human audio system just after the cochlea. Such data is closely aligned with the log-power-spectrogram representations of auditory signals. It is long known that the super-position of data components in these data is non-linear and well approximated by a point-wise maximum of the individual spectrograms [2].For component extraction from auditory spectrogram data we therefore investigate learning algorithms based on a class of generative models that assume a non-linear superposition of data components. The component extraction algorithm of Maximal Causes Analysis (MCA; [3]) assumes a maximum combination where other algorithms use the sum. Training such non-linear models is, in general, computationally expensive but can be made feasible using approximation schemes based on Expectation Maximization (EM). Here we apply an EM approximation scheme that is based on the pre-selection of the most probable causes for every data-point. The approximation results in approximate maximum likelihood solutions, reduces the computational complexity significantly while at the same time allowing for an efficient and parallelized implementation running on clustered compute nodes. To evaluate the applicability of non-linear component extraction to auditory spectrogram data, we generated training data by randomly choosing and linearly mixing waveforms from a set of 10 different phonemes (sampled at 8000Hz). We then applied an MCA algorithm based on EM and pre-selection. The algorithm was presented only the log-spectrograms of the mixed signals. Assuming Gaussian noise the algorithm was able to extract the log-spectrograms of the individual phonemes. We obtained similar results for different forms of phoneme mixtures including mixtures of always three randomly chosen phonemes.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.