Gamma Boltzmann Machine for Audio Modeling

Toru Nakashika,Kohei Yatabe

doi:10.1109/taslp.2021.3095656

Abstract

This paper presents an energy-based probabilistic model that handles nonnegative data in consideration of both linear and logarithmic scales. In audio applications, magnitude of time-frequency representation, including spectrogram, is regarded as one of the most important features. Such magnitude-based features have been extensively utilized in learning-based audio processing. Since a logarithmic scale is important in terms of auditory perception, the features are usually computed with a logarithmic function. That is, a logarithmic function is applied within the computation of features so that a learning machine does not have to explicitly model the logarithmic scale. We think in a different way and propose a restricted Boltzmann machine (RBM) that simultaneously models linear- and log-magnitude spectra. RBM is a stochastic neural network that can discover data representations without supervision. To manage both linear and logarithmic scales, we define an energy function based on both scales. This energy function results in a conditional distribution (of the observable data, given hidden units) that is written as the gamma distribution, and hence the proposed RBM is termed gamma-Bernoulli RBM. The proposed gamma-Bernoulli RBM was compared to the ordinary Gaussian-Bernoulli RBM by speech representation experiments. Both objective and subjective evaluations illustrated the advantage of the proposed model.

Highlights

L EARNING data representation is a fundamental task, and many methods have been proposed, e.g., variational autoencoders (VAEs) [1]–[3], generative adversarial networks (GANs) [4]–[7], autoregressive (AR) models [8], [9], and normalizing flows [10], [11]
The proposed restricted Boltzmann machine (RBM) represents the conditional distribution of the visible units by the gamma distribution, which naturally limits the domain of data to positive numbers
We introduced a general gamma Boltzmann machine and showed that its conditional distribution is given by the gamma distribution

Summary

INTRODUCTION

L EARNING data representation is a fundamental task, and many methods have been proposed, e.g., variational autoencoders (VAEs) [1]–[3], generative adversarial networks (GANs) [4]–[7], autoregressive (AR) models [8], [9], and normalizing flows [10], [11]. The Gaussian-Bernoulli RBM [17], [18] has been utilized for modeling signals through the magnitude spectra. We propose a variant of RBMs called gammaBernoulli RBM for modeling magnitude spectra in consider-. To manage both linear and logarithmic scales, we define an energy function consisting of the usual quadratic term and an additional log-magnitude term This energy function provides a general gamma Boltzmann machine that simultaneously considers linear- and logmagnitude spectra. The proposed RBM represents the conditional distribution of the visible units (given hidden units) by the gamma distribution, which naturally limits the domain of data to positive numbers. The optimal model among the proposed RBMs was investigated by speech representation experiments Both objective and subjective evaluations illustrated the advantage of the gamma-Bernoulli RBM.

Boltzmann Machine

Bernoulli-Bernoulli RBM

Gaussian-Bernoulli RBM

GAMMA BOLTZMANN MACHINE

Proposed Gamma Boltzmann Machine

Transition from Gamma Boltzmann Machine to RBM

Proposed Gamma-Bernoulli RBM

Implementation of Gamma-Bernoulli RBM

Objective Function and Parameter Optimization

Some Extensions of the Proposed Boltzmann Machines

EXPERIMENTS

Experimental Configuration

Properties of the Proposed Gamma-Bernoulli RBM

Performance Comparison with the Conventional RBM

Performance Comparison with Deep Neural Networks

Data Compression by the Binary Representation

Balance between the Linear and Logarithmic Scales

Gamma-gamma RBM

Findings

CONCLUSION

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE/ACM transactions on audio, speech, and language processing	Publication Date: Jan 1, 2021
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Gamma Boltzmann Machine for Audio Modeling

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE/ACM transactions on audio, speech, and language processing

Lead the way for us

Similar Papers

Multilayer perceptrons, Hopfield’s associative memories, and restricted Boltzmann machines
Shin-Ichi Asakawa
BMC neuroscience | VOL. 15
Shin-Ichi AsakawaShin-Ichi Asakawa
01 Jul 2014
BMC neuroscience | VOL. 15

Measuring the Usefulness of Hidden Units in Boltzmann Machines with Mutual Information
Mathias Berglund ... Tapani Raiko
-
Mathias Berglund, et. al.Mathias Berglund ... Tapani Raiko
01 Jan 2013
01 Jan 2013

Interval type-2 fuzzy sets for enhanced learning in deep belief networks
Amit K. Shukla ... Taniya Seth
-
Amit K. Shukla, et. al.Amit K. Shukla ... Taniya Seth
01 Jul 2017
01 Jul 2017

Restricted Boltzmann Machines Without Random Number Generators for Efficient Digital Hardware Implementation
Sansei Hori ... Takashi Morie
-
Sansei Hori, et. al.Sansei Hori ... Takashi Morie
01 Jan 2015
01 Jan 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Gamma Boltzmann Machine for Audio Modeling

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE/ACM transactions on audio, speech, and language processing