Abstract

The proposed work investigates the impact of Mel Frequency Cepstral Coefficients (MFCC), Chroma DCT Reduced Pitch (CRP), and Chroma Energy Normalized Statistics (CENS) for instrument recognition from monophonic instrumental music clips using deep learning techniques, Bidirectional Recurrent Neural Networks with Long Short-Term Memory (BRNN-LSTM), stacked autoencoders (SAE), and Convolutional Neural Network - Long Short-Term Memory (CNN-LSTM). Initially, MFCC, CENS, and CRP features are extracted from instrumental music clips collected as a dataset from various online libraries. In this work, the deep neural network models have been fabricated by training with extracted features. Recognition rates of 94.9%, 96.8%, and 88.6% are achieved using combined MFCC and CENS features, and 90.9%, 92.2%, and 87.5% are achieved using combined MFCC and CRP features with deep learning models BRNN-LSTM, CNN-LSTM, and SAE, respectively. The experimental results evidence that MFCC features combined with CENS and CRP features at score level revamp the efficacy of the proposed system.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call