Abstract

Autoencoders are a self-supervised learning system where, during training, the output is an approximation of the input. Typically, autoencoders have three parts: Encoder (which produces a compressed latent space representation of the input data), the Latent Space (which retains the knowledge in the input data with reduced dimensionality but preserves maximum information) and the Decoder (which reconstructs the input data from the compressed latent space). Autoencoders have found wide applications in dimensionality reduction, object detection, image classification, and image denoising applications. Variational Autoencoders (VAEs) can be regarded as enhanced Autoencoders where a Bayesian approach is used to learn the probability distribution of the input data. VAEs have found wide applications in generating data for speech, images, and text. In this paper, we present a general comprehensive overview of variational autoencoders. We discuss problems with the VAEs and present several variants of the VAEs that attempt to provide solutions to the problems. We present applications of variational autoencoders for finance (a new and emerging field of application), speech/audio source separation, and biosignal applications. Experimental results are presented for an example of speech source separation to illustrate the powerful application of variants of VAE: VAE, -VAE, and ITL-AE. We conclude the paper with a summary, and we identify possible areas of research in improving performance of VAEs in particular and deep generative models in general, of which VAEs and generative adversarial networks (GANs) are examples.

Highlights

  • One of the distinct traits of human intelligence is the ability to imagine and synthesize.Generative modeling in machine learning aims to train algorithms to synthesize completely new data, such as audio, text, and images; it does so by estimating the density of the data, and sampling from that estimated density

  • The results for the experiments from paper [114]; the graphs are taken directly from this paper. (a) These are the results for the Signal to Artifact Ratio (SAR). (b) These are the results for the Signal to Distortion Ratio (SDR). (c) These are the results for the Signal to Interference Ratio (SIR)

  • For pairs, A is (64 ms, 16 ms), B is (64 ms, 32 ms), C is (32 ms, 8 ms), D is (32 ms, 16 ms), E is (16 ms, 8 ms), F is (16 ms, 4 ms), G is (8 ms, 4 ms) and H is (8 ms, 2 ms) (a) These are the results for the SAR. (b) These are the results for the SDR. (c) These are the results for the SIR

Read more

Summary

Introduction

Generative modeling in machine learning aims to train algorithms to synthesize completely new data, such as audio, text, and images; it does so by estimating the density of the data, and sampling from that estimated density. Information theory is key to machine learning, from the usage of information-theoretic measures for the loss functions [5,6,7] to use for analysis through the information bottleneck framework [8,9,10]. It is recommended to know key measures in information theory. It can be thought of as a way to measure the uncertainty in a random vector or random process x, which has a joint PDF p(x) It is a generalization of the variance of a process

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call