Abstract
The objective of audio inpainting is to fill a gap in an audio signal. This is ideally done by reconstructing the original signal or, at least, by inferring a meaningful surrogate signal. We propose a novel approach applying sparse modeling in the time-frequency (TF) domain. In particular, we devise a dictionary learning technique which learns the dictionary from reliable parts around the gap with the goal to obtain a signal representation with increased TF sparsity. This is based on a basis optimization technique to deform a given Gabor frame such that the sparsity of the analysis coefficients of the resulting frame is maximized. Furthermore, we modify the SParse Audio INpainter (SPAIN) for both the analysis and the synthesis model such that it is able to exploit the increased TF sparsity and—in turn—benefits from dictionary learning. Our experiments demonstrate that the developed methods achieve significant gains in terms of signal-to-distortion ratio (SDR) and objective difference grade (ODG) compared with several state-of-the-art audio inpainting techniques.
Highlights
A UDIO signals are often prone to localized distortions resulting in modification or even loss of certain sections
While [18] investigates weighted 1-norms to cope with these systematic biases, we propose a modification of the original SParse Audio INpainter (SPAIN) algorithm: instead of the mentioned segment-wise processing scheme, we apply the algorithm to the signal parts in the neighborhood of the gap as a whole
By means of the resulting sparsity enhanced dictionary our modified SPAIN algorithm exhibits a significantly improved reconstruction performance
Summary
A UDIO signals are often prone to localized distortions resulting in modification or even loss of certain sections. A signal processing technique that aims at restoring such gaps, i.e., missing consecutive samples, while still keeping perceptible audio artifacts as small as possible is usually referred to as audio inpainting [1]. It has attracted a lot of attention as it has various important applications. Some of the first methods were proposed by Janssen et al [5], [6] and Etter [7] These techniques exploit available signal information in the original (time) domain and fill the missing samples by linear prediction using (learned) autoregressive coefficients. For a more comprehensive study on autoregressive-based audio inpainting we refer to, e.g., [8]–[10]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE Journal of Selected Topics in Signal Processing
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.