Time\u2013frequency scattering accurately models auditory similarities between instrumental playing techniques

Vincent Lostanlen,Christian El-Hajj,Mathieu Lagrange,Mathias Rossignol,Grégoire Lafay,Joakim Andén

doi:10.1186/s13636-020-00187-z

Abstract

Instrumentalplaying techniques such as vibratos, glissandos, and trills often denote musical expressivity, both in classical and folk contexts. However, most existing approaches to music similarity retrieval fail to describe timbre beyond the so-called “ordinary” technique, use instrument identity as a proxy for timbre quality, and do not allow for customization to the perceptual idiosyncrasies of a new subject. In this article, we ask 31 human participants to organize 78 isolated notes into a set of timbre clusters. Analyzing their responses suggests that timbre perception operates within a more flexible taxonomy than those provided by instruments or playing techniques alone. In addition, we propose a machine listening model to recover the cluster graph of auditory similarities across instruments, mutes, and techniques. Our model relies on joint time–frequency scattering features to extract spectrotemporal modulations as acoustic features. Furthermore, it minimizes triplet loss in the cluster graph by means of the large-margin nearest neighbor (LMNN) metric learning algorithm. Over a dataset of 9346 isolated notes, we report a state-of-the-art average precision at rank five (AP@5) of 99.0%±1. An ablation study demonstrates that removing either the joint time–frequency scattering transform or the metric learning algorithm noticeably degrades performance.

Highlights

Music information retrieval (MIR) operates at two levels: symbolic and auditory [1]
The sole mention of a playing technique does not specify its effect in terms of auditory perception
5 Results The previous section described our methods for extracting spectrotemporal modulations in audio signals, as well as learning a non-Euclidean similarity metric between them

Summary

Introduction

Music information retrieval (MIR) operates at two levels: symbolic and auditory [1]. By relying on a notation system, the symbolic level allows the comparison of musical notes in terms of quantitative attributes, such as duration, pitch, and intensity at the source. Symbolic representations describe timbre indirectly, either via visuotactile metaphors (e.g., bright, rough, and so forth [3]) or via an instrumental playing technique (e.g., bowed or plucked) [4]. Despite their widespread use, purely linguistic references to timbre fail to convey the intention of the composer. The sole mention of a playing technique does not specify its effect in terms of auditory perception. The term breathy alludes to a playing technique that is specific to wind instruments, a cellist may accomplish a seemingly breathy timbre by bowing near the fingerboard, i.e., sul tasto in the classical terminology. In a diverse instrumentarium, the semantic similarity between playing technique denominations does not reflect such acoustical similarity [6]

Objectives

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: EURASIP Journal on Audio, Speech, and Music Processing	Publication Date: Jan 11, 2021
Citations: 7	License type: open-access

R Discovery Prime

R Discovery Prime

Time\u2013frequency scattering accurately models auditory similarities between instrumental playing techniques

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Audio, Speech, and Music Processing

Lead the way for us

Similar Papers

Fast LMNN Algorithm through Random Sampling
Kaiyuan Wu ... Zhiming Zheng
-
Kaiyuan Wu, et. al.Kaiyuan Wu ... Zhiming Zheng
01 Nov 2015
01 Nov 2015

Study and Observation of the Variation of Accuracies of KNN, SVM, LMNN, ENN Algorithms on Eleven Different Datasets from UCI Machine Learning Repository
Mohammad Mahmudur Rahman Khan ... Rezoana Bente Arif
-
Mohammad Mahmudur Rahman Khan, et. al.Mohammad Mahmudur Rahman Khan ... Rezoana Bente Arif
01 Sep 2018
01 Sep 2018

Parameter Free Large Margin Nearest Neighbor for Distance Metric Learning
Kun Song ... Junwei Han
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 31
Kun Song, et. al.Kun Song ... Junwei Han
13 Feb 2017
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 31

Large Margin Nearest Neighbor Classification With Privileged Information for Biometric Applications
Jingwen He ... Dong Xu
IEEE Transactions on Circuits and Systems for Video Technology | VOL. 30
Jingwen He, et. al.Jingwen He ... Dong Xu
23 Jul 2019
IEEE Transactions on Circuits and Systems for Video Technology | VOL. 30

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Time\u2013frequency scattering accurately models auditory similarities between instrumental playing techniques

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Audio, Speech, and Music Processing