Statistical and Visual Analysis of Audio, Text, and Image Features for Multi-Modal Music Genre Recognition.

Ben Wilkes,Igor Vatolkin,Heinrich Müller

doi:10.3390/e23111502

Abstract

We present a multi-modal genre recognition framework that considers the modalities audio, text, and image by features extracted from audio signals, album cover images, and lyrics of music tracks. In contrast to pure learning of features by a neural network as done in the related work, handcrafted features designed for a respective modality are also integrated, allowing for higher interpretability of created models and further theoretical analysis of the impact of individual features on genre prediction. Genre recognition is performed by binary classification of a music track with respect to each genre based on combinations of elementary features. For feature combination a two-level technique is used, which combines aggregation into fixed-length feature vectors with confidence-based fusion of classification results. Extensive experiments have been conducted for three classifier models (Naïve Bayes, Support Vector Machine, and Random Forest) and numerous feature combinations. The results are presented visually, with data reduction for improved perceptibility achieved by multi-objective analysis and restriction to non-dominated data. Feature- and classifier-related hypotheses are formulated based on the data, and their statistical significance is formally analyzed. The statistical analysis shows that the combination of two modalities almost always leads to a significant increase of performance and the combination of three modalities in several cases.

Highlights

Music genre recognition is one of the most common classification tasks in music information retrieval, with several hundreds of published studies mentioned by Sturm [1]
We have proposed a multi-modal genre recognition framework that considers the modalities audio, text, and image by features extracted from audio signals, album cover images, and lyrics of music tracks
Features were selected that are known to be powerful in the domains of audio signal, text, and image, and an approach to their combination that meets the requirements of the features of the different modalities was presented

Summary

Introduction

Music genre recognition is one of the most common classification tasks in music information retrieval, with several hundreds of published studies mentioned by Sturm [1]. We present a multi-modal genre recognition framework that considers audio, text and image features of a music track by features of audio tracks, album cover images, and lyrics. Besides combination of features into fixed-length feature vectors, a second approach of feature combination in form of confidence-based fusion of predictions obtained from several feature vectorbased predictions is employed This allows a detailed representation of longer audio tracks by a length-dependent number of feature values. Data reduction techniques based on multi-objective analysis and restriction to non-dominated data are proposed and applied. Based on these data, feature- and classifier-related hypotheses are formulated and their significance is statistically tested.

Related Work

Audio Features

Text Features

Image Features

Evaluation

Configuration Of Experiments

Visual Data Analysis

Removal of Dominated Results

Filtering of Less Relevant Results

Feature-Related Hypotheses The feature-related hypotheses are as follows: M1

Classifier-Related Hypotheses

Conclusions and Future Work

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Statistical and Visual Analysis of Audio, Text, and Image Features for Multi-Modal Music Genre Recognition.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Entropy (Basel, Switzerland)

Lead the way for us

Journal: Entropy (Basel, Switzerland)	Publication Date: Nov 12, 2021
License type: CC BY 4.0

Similar Papers

PseAAC2Vec protein encoding for TCR protein sequence classification
Zahra Tayebi ... Murray Patterson
Computers in Biology and Medicine | VOL. 170
Zahra Tayebi, et. al.Zahra Tayebi ... Murray Patterson
04 Jan 2024
Computers in Biology and Medicine | VOL. 170

Pre-Processing Structured Data for Standard Machine Learning Algorithms by Supervised Graph Propositionalization - A Case Study with Medicinal Chemistry Datasets
Thashmee Karunaratne ... Henrik Bostrom
-
Thashmee Karunaratne, et. al.Thashmee Karunaratne ... Henrik Bostrom
01 Dec 2010
01 Dec 2010

A novel study to classify breath inhalation and breath exhalation using audio signals from heart and trachea
Ahmet Reşit Kavsaoğlu ... Eftal Sehirli
Biomedical Signal Processing and Control | VOL. 80
Ahmet Reşit Kavsaoğlu, et. al.Ahmet Reşit Kavsaoğlu ... Eftal Sehirli
13 Oct 2022
Biomedical Signal Processing and Control | VOL. 80

Study of the prediction of gamma passing rate in dosimetric verification of intensity-modulated radiotherapy using machine learning models based on plan complexity.
Shizhen Bin ... Ji Zhang
Frontiers in oncology | VOL. 13
Shizhen Bin, et. al.Shizhen Bin ... Ji Zhang
21 Jul 2023
Frontiers in oncology | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Statistical and Visual Analysis of Audio, Text, and Image Features for Multi-Modal Music Genre Recognition.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Entropy (Basel, Switzerland)