Elicitation of Expert Knowledge to Inform Object-Based Audio Rendering to Different Systems

James Woodcock,William Davies,Trevor Cox,Frank Melchior

doi:10.17743/jaes.2018.0001

Abstract

Object-based audio presents the opportunity to optimise audio reproduction for different listening scenarios. Vector base amplitude panning (VBAP) is typically used to render object-based scenes. Optimizing this process based on knowledge of the perception and practices of experts could result in significant improvements to the end user's listening experience. An experiment was conducted to investigate how content creators perceive changes in the perceptual attributes of the same content rendered to systems with different numbers of channels, and to determine what they would do differently to standard VBAP and matrix based downmixes to minimize these changes. Text mining and clustering of the content creators' responses revealed 6 general mix processes: the spatial spread of individual objects, EQ and processing, reverberation, position, bass, and level. Logistic regression models show the relationships between the mix processes, perceived changes in perceptual attributes, and the rendering method/speaker layout. The relative frequency of use for the different mix processes was found to differ between categories of audio object suggesting that any downmix rules should be object category specific. These results give insight into how object-based audio can be used to improve listener experience and provide the first template for doing this across different reproduction systems.

Highlights

Object-based broadcast has been described as the “logical step” in broadcast technology [1]; this is reflected in current large scale research projects [2,3,4], standardization activities [5, 6], interest from broadcasters [1], and commercialization [7, 8]
Stopwords are words that are disregarded in the text mining process because they offer little predictive power; this may be because they are common words within the language (e.g., a, the, and) or because they are common within the domain that is being investigated
This paper has presented the results of an experiment designed to identify a small number of the most common mix processes used by sound designers when mixing objectbased content to loudspeaker systems with different numbers of channels

Summary

Introduction

Object-based broadcast has been described as the “logical step” in broadcast technology [1]; this is reflected in current large scale research projects [2,3,4], standardization activities [5, 6], interest from broadcasters [1], and commercialization [7, 8]. Object-based audio (OBA) is an approach to sound storage, transmission, and reproduction whereby individual audio objects with associated metadata are transmitted and rendered at the client side of the broadcast chain. An object will typically consist of an audio signal and metadata indicating the object’s position and level; objects may contain semantic metadata indicating, for example, the language of a dialogue track or whether the object is positioned on or off screen. This is in contrast to traditional channel based audio, where pre-rendered content for a fixed reproduction system is broadcast. There are many open questions regarding how to render OBA content optimally for different reproduction systems

Methods

Results

Discussion

Conclusion