Learning long-term filter banks for audio source separation and audio scene classification

Teng Zhang,Ji Wu

doi:10.1186/s13636-018-0127-7

Teng Zhang, Ji Wu

Open Access

https://doi.org/10.1186/s13636-018-0127-7

Copy DOI

Abstract

■■■Filter banks on short-time Fourier transform (STFT) spectrogram have long been studied to analyze and process audios. The frameshift in STFT procedure determines the temporal resolution. However, in many discriminative audio applications, long-term time and frequency correlations are needed. The authors in this work use Toeplitz matrix motivated filter banks to extract long-term time and frequency information. This paper investigates the mechanism of long-term filter banks and the corresponding spectrogram reconstruction method. The time duration and shape of the filter banks are well designed and learned using neural networks. We test our approach on different tasks. The spectrogram reconstruction error in audio source separation task is reduced by relatively 6.7% and the classification error in audio scene classification task is reduced by relatively 6.5%, when compared with the traditional frequency filter banks. The experiments also show that the time duration of long-term filter banks in classification task is much larger than in reconstruction task.

Highlights

Audios in a realistic environment are typically composed of different sound sources
The time duration of long-term filter banks is limited by σk, the strength of each frequency bin is reconstructed by αk, the total number of parameters reduces from 2mT in Eq 2 to 2m in Eq 3
5 Conclusions A novel framework of filter banks that can extract longterm time and frequency correlation is proposed in this paper

Summary

Introduction

Audios in a realistic environment are typically composed of different sound sources. Yet humans have no problem in organizing the elements into their sources to recognize the acoustic environment. Neural networks organized into a twodimensional space have been proposed to model the time and frequency organization of audio elements by Wang and Chang [22]. They utilized two-dimensional Gaussian lateral connectivity and global inhibition to parameterize the network, where the two dimensions correspond to frequency and time respectively. The time duration is different, but for each frame, the filter shape is constant This mechanism can be implemented using a Toeplitz matrix motivated network.

Long-term filter banks

Toeplitz motivation

Shape constraint

Spectrogram reconstruction

Audio scene classification

Method

Findings

Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: EURASIP Journal on Audio, Speech, and Music Processing	Publication Date: May 30, 2018
Citations: 3	License type: open-access

R Discovery Prime

R Discovery Prime

Learning long-term filter banks for audio source separation and audio scene classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Audio, Speech, and Music Processing

Lead the way for us

Similar Papers

Discriminative frequency filter banks learning with neural networks
Teng Zhang ... Ji Wu
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2019
Teng Zhang, et. al.Teng Zhang ... Ji Wu
03 Jan 2019
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2019

An Adversarial Feature Distillation Method for Audio Classification
Liang Gao ... Yuxing Peng
IEEE Access | VOL. 7
Liang Gao, et. al.Liang Gao ... Yuxing Peng
01 Jan 2019
IEEE Access | VOL. 7

Learning Temporal Resolution in Spectrogram for Audio Classification
Haohe Liu ... Qiuqiang Kong
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 38
Haohe Liu, et. al.Haohe Liu ... Qiuqiang Kong
24 Mar 2024
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 38

Novel algorithm for auditory compensation in hearing aids
G K Girisha
Indian Journal of Science and Technology | VOL. 13
G K GirishaG K Girisha
30 Dec 2020
Indian Journal of Science and Technology | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Learning long-term filter banks for audio source separation and audio scene classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Audio, Speech, and Music Processing