Abstract

AbstractWe aim to investigate how closely neural networks (NNs) mimic human thinking. As a step in this direction, we study the behavior of artificial neuron(s) that fire most when the input data score high on some specific emergent concepts. In this paper, we focus on music, where the emergent concepts are those of rhythm, pitch and melody as commonly used by humans. As a black box to pry open, we focus on Google’s MusicVAE, a pre-trained NN that handles music tracks by encoding them in terms of 512 latent variables. We show that several hundreds of these latent variables are “irrelevant” in the sense that can be set to zero with minimal impact on the reconstruction accuracy. The remaining few dozens of latent variables can be sorted by order of relevance by comparing their variance. We show that the first few most relevant variables, and only those, correlate highly with dozens of human-defined measures that describe rhythm and pitch in music pieces, thereby efficiently encapsulating many of these human-understandable concepts in a few nonlinear variables.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call