Automatic Spatial Audio Scene Classification in Binaural Recordings of Music

Sławomir K Zieliński,Hyunkook Lee

doi:10.3390/app9091724

Sławomir K Zieliński, Hyunkook Lee

Open Access

https://doi.org/10.3390/app9091724

Copy DOI

Abstract

The aim of the study was to develop a method for automatic classification of the three spatial audio scenes, differing in horizontal distribution of foreground and background audio content around a listener in binaurally rendered recordings of music. For the purpose of the study, audio recordings were synthesized using thirteen sets of binaural-room-impulse-responses (BRIRs), representing room acoustics of both semi-anechoic and reverberant venues. Head movements were not considered in the study. The proposed method was assumption-free with regards to the number and characteristics of the audio sources. A least absolute shrinkage and selection operator was employed as a classifier. According to the results, it is possible to automatically identify the spatial scenes using a combination of binaural and spectro-temporal features. The method exhibits a satisfactory classification accuracy when it is trained and then tested on different stimuli but synthesized using the same BRIRs (accuracy ranging from 74% to 98%), even in highly reverberant conditions. However, the generalizability of the method needs to be further improved. This study demonstrates that in addition to the binaural cues, the Mel-frequency cepstral coefficients constitute an important carrier of spatial information, imperative for the classification of spatial audio scenes.

Highlights

This study builds on the work on the spatial audio scene characterization of five-channel surround sound recordings undertaken by Zieliński [23,24]
The aim of the first experiment was to check how the method performed when trained and tested on the excerpts synthesized using the same sets of BRIRs
The exception was the model obtained for the BRIR set No 7, for which the accuracy attained for the spectral features and the Root Mean Square (RMS)-based metrics was equal to 75.8% and 69.2%, respectively

Summary

Background

Binaural audio technology is rapidly gaining popularity. For example, it is widely used for the rendering of 360◦ virtual reality content in one of the most popular video-sharing Internet services [1]. Multiple-source localization models have been developed [6,7,8,9], which constitutes an important step towards quantification of higher-level attributes (e.g., ensemble width), leading to a holistic characterization of complex spatial audio scenes. While their reported accuracy is deemed to be good, their applicability is limited, as they often require a priori knowledge about the number of sources of interest and their signal characteristics. The above considerations underlay the work described in this paper

Aims of the Study

Method Overview

Corpus of Binaural Audio Recordings

Raw Audio Material

Database of Binaural Room Impulse Responses

Synthesis of Binaural Recordings

Feature Extraction calypso

Binaural Cues

Spectral Features

Classification Algorithm

Results

Experiment 1

Experiment 2

Principal Component Analysis

Results theprinciple principle component for dimensions

Discussion

Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Sciences	Publication Date: Apr 26, 2019
Citations: 9	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Automatic Spatial Audio Scene Classification in Binaural Recordings of Music

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

Spatial Audio Scene Characterization (SASC)
Sławomir K Zieliński
-
Sławomir K ZielińskiSławomir K Zieliński
15 Aug 2018
15 Aug 2018

A Parametric Method for Elevation Control
Dingding Yao ... Feiran Yang
-
Dingding Yao, et. al.Dingding Yao ... Feiran Yang
01 Sep 2018
01 Sep 2018

Rapid BRIR generation approach using Variational Auto-Encoders and LSTM neural networks
D Sanaguano-Moreno ... G.B Sampaio-Regattieri
Applied Acoustics | VOL. 215
D Sanaguano-Moreno, et. al.D Sanaguano-Moreno ... G.B Sampaio-Regattieri
10 Nov 2023
Applied Acoustics | VOL. 215

A binaural model that uses head-movements to evaluate acoustical spaces
Jonas Braasch ... Ning Xiang
The Journal of the Acoustical Society of America | VOL. 131
Jonas Braasch, et. al.Jonas Braasch ... Ning Xiang
01 Apr 2012
The Journal of the Acoustical Society of America | VOL. 131

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automatic Spatial Audio Scene Classification in Binaural Recordings of Music

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences