End-to-End Learning for Musical Instruments Classification

Renato Profeta,Gerald Schuller

doi:10.1109/ieeeconf53345.2021.9723181

Abstract

Musical instruments classification is a widely studied topic in Music Information Retrieval (MIR) and Signal Processing. The applications of this subject go from indexing of an audio database, automatic transcription, recommender systems, to music search by timbre, music annotation and others. Many different techniques were used along the years using deep neural networks with hand engineered features or learned features [1] [2] [3]. The purpose of this paper is to present Convolutional Neural Network (CNN) based Filter Banks that can generate not only features optimized for classification in the encoded domain but also achieving near perfect reconstruction in the decoder output with similar quality of standard lossy audio codecs. The filter banks are then compared with other commonly used invertible transformations employed as features in classification problems such as Short-time Fourier Transform (STFT) spectrograms and Mel spectrograms using a same simple classifier with a small number of parameters. The idea is that the heavy weight is lifted by the learned features and not the classifier whilst achieving near perfect reconstruction.

Full Text