Stochastic Restoration of Heavily Compressed Musical Audio Using Generative Adversarial Networks

Stefan Lattner,Javier Nistal

doi:10.3390/electronics10111349

Abstract

Lossy audio codecs compress (and decompress) digital audio streams by removing information that tends to be inaudible in human perception. Under high compression rates, such codecs may introduce a variety of impairments in the audio signal. Many works have tackled the problem of audio enhancement and compression artifact removal using deep-learning techniques. However, only a few works tackle the restoration of heavily compressed audio signals in the musical domain. In such a scenario, there is no unique solution for the restoration of the original signal. Therefore, in this study, we test a stochastic generator of a Generative Adversarial Network (GAN) architecture for this task. Such a stochastic generator, conditioned on highly compressed musical audio signals, could one day generate outputs indistinguishable from high-quality releases. Therefore, the present study may yield insights into more efficient musical data storage and transmission. We train stochastic and deterministic generators on MP3-compressed audio signals with 16, 32, and 64 kbit/s. We perform an extensive evaluation of the different experiments utilizing objective metrics and listening tests. We find that the models can improve the quality of the audio signals over the MP3 versions for 16 and 32 kbit/s and that the stochastic generators are capable of generating outputs that are closer to the original signals than those of the deterministic generators.

Highlights

We find that the models can improve the quality of the audio signals over the MP3 versions for 16 and 32 kbit/s and that the stochastic generators are capable of generating outputs that are closer to the original signals than those of the deterministic generators
Inspired by a recent work demonstrating that DNNs implementing complex operators [58] may outperform previous architectures in many audio-related tasks, new state-of-the-art performances were achieved on speech enhancement using complex representations of audio data [14,15]
The main representation used in the proposed method are the complex STFT components of the audio data h j,k ∈ C JK, as it has been shown that this representation works well for audio generation with Generative Adversarial Network (GAN) in [67]

Summary

Introduction

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. Most of the works in this line of research tackle the enhancement of speech signals [7,8,9,10,12,13,14,15,16,17,18], and only a few publications exist for musical audio restoration [11,19,20,21]. It has already been shown that strong generative models can enhance heavily corrupted speech through resynthesis with neural vocoders [22] Along these lines, examining a generative (i.e., stochastic) decoder for heavily compressed audio signals may contribute to insights about more efficient musical data storage and transmission. Audio examples of the work are provided in the accompanying website (Available online: https://sonycslparis.github.io/restoration_mdpi_suppl_mat/ (accessed on 4 June 2021)

Related Work

Bandwidth Extension

Audio Enhancement

Materials and Methods

Model Architecture

Architecture Details

Gated Convolutions

Frequency Aggregation Filters

Training Procedure

Preventing Mode Collapse

Data Representation

Evaluation

Objective Difference Grade and Distortion Index

Log-Spectral Distance

Mean Squared Error

Signal-to-Noise Ratio

Mean Opinion Score

Results and Discussion

Objective Evaluation

Informal Listening

Formal Listening

Conclusions and Future Work

Author Biography

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Electronics	Publication Date: Jun 5, 2021
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Stochastic Restoration of Heavily Compressed Musical Audio Using Generative Adversarial Networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronics

Lead the way for us

Similar Papers

GAN-LSTM Predictor for Failure Prognostics of Rolling Element Bearings
Hao Lu ... Chao Hu
-
Hao Lu, et. al.Hao Lu ... Chao Hu
07 Jun 2021
07 Jun 2021

Generative synthetic adversarial network for internal bias correction and handling class imbalance problem in medical image diagnosis
Mina Rezaei ... Christoph Meinel
-
Mina Rezaei, et. al.Mina Rezaei ... Christoph Meinel
16 Mar 2020
16 Mar 2020

Deconstructing Generative Adversarial Networks
Banghua Zhu ... Jiantao Jiao
IEEE Transactions on Information Theory | VOL. 66
Banghua Zhu, et. al.Banghua Zhu ... Jiantao Jiao
01 Nov 2020
IEEE Transactions on Information Theory | VOL. 66

Audio quality assessment using the mean structural similarity measure
Srivatsan Kandadai ... Charles D Creusere
-
Srivatsan Kandadai, et. al.Srivatsan Kandadai ... Charles D Creusere
01 Mar 2008
01 Mar 2008

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Stochastic Restoration of Heavily Compressed Musical Audio Using Generative Adversarial Networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronics