Abstract

The term “immersive audio” is frequently used to describe an audio experience that provides the listener the sensation of being fully immersed or “present” in a sound scene. This can be achieved via different presentation modes, such as surround sound (several loudspeakers horizontally arranged around the listener), 3D audio (with loudspeakers at, above, and below listener ear level), and binaural audio to headphones. This article provides an overview of two recent standards that support the bitrate-efficient carriage of high-quality immersive sound. The first is MPEG-H 3D audio, which is a versatile standard that supports multiple immersive sound signal formats (channels, objects, and higher order ambisonics) and is now being adopted in broadcast and streaming applications. The second is MPEG-I immersive audio, an extension of 3D audio, currently under development, which is targeted for virtual and augmented reality applications. This will support rendering of fully user-interactive immersive sound for three degrees of user movement [three degrees of freedom (3DoF)], i.e., yaw, pitch, and roll head movement, and for six degrees of user movement [six degrees of freedom (6DoF)], i.e., 3DoF plus translational x, y, and z user position movements.

Highlights

  • The term “immersive audio” is often used to characterize the latest generation of sound systems that aim at providing an audio experience that conveys to the listener the sensation of being fully immersed into or “present” in a surrounding sound scene

  • The main part of the incoming Motion Picture Experts Group (MPEG)-H 3D audio bitstream is decoded by the core decoder that reproduces the encoded waveforms that represent either channel signals, object signals, or higher order ambisonics (HOA) coefficient signals

  • In MPEG-I, the user can move around in the world created by the media presentation, with head movement or both head movement and body movement in virtual space, where we assume that audio presentation is done via headphones

Read more

Summary

INTRODUCTION

The term “immersive audio” is often used to characterize the latest generation of sound systems that aim at providing an audio experience that conveys to the listener the sensation of being fully immersed into or “present” in a surrounding sound scene. While early sound reproduction systems provided stereophonic sound reproduction over two loudspeakers with an illusion of left-right (and depth) perception to the listener for a limited frontal sound field [1], [2], the second generation added a 360◦ “surround” experience that extended the presented sound stage to include both to the extreme left and right, as well as sound from behind the listener by adding more loudspeakers from all horizontal directions (e.g., 5.1 and 7.1 [3]–[5]) This already provides a significant degree of user immersion into the sound field. MPEG-H is described in [11]; it is a foundational technology for MPEG-I audio and, requires some description here in order to make this article understood by the reader

M P E G-HAUDIO
Overview and Concepts
Waveform Coding
Format Conversion
Object Rendering
HOA Decoding and Rendering
Performance
M P E G-IIMMERSIVEAUDIO
MPEG-I 3DoF Audio
Requirements for MPEG-I 6DoF Audio
Developing MPEG-I 6DoF Immersive Audio
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.