Abstract
The term “immersive audio” is frequently used to describe an audio experience that provides the listener the sensation of being fully immersed or “present” in a sound scene. This can be achieved via different presentation modes, such as surround sound (several loudspeakers horizontally arranged around the listener), 3D audio (with loudspeakers at, above, and below listener ear level), and binaural audio to headphones. This article provides an overview of two recent standards that support the bitrate-efficient carriage of high-quality immersive sound. The first is MPEG-H 3D audio, which is a versatile standard that supports multiple immersive sound signal formats (channels, objects, and higher order ambisonics) and is now being adopted in broadcast and streaming applications. The second is MPEG-I immersive audio, an extension of 3D audio, currently under development, which is targeted for virtual and augmented reality applications. This will support rendering of fully user-interactive immersive sound for three degrees of user movement [three degrees of freedom (3DoF)], i.e., yaw, pitch, and roll head movement, and for six degrees of user movement [six degrees of freedom (6DoF)], i.e., 3DoF plus translational x, y, and z user position movements.
Highlights
The term “immersive audio” is often used to characterize the latest generation of sound systems that aim at providing an audio experience that conveys to the listener the sensation of being fully immersed into or “present” in a surrounding sound scene
The main part of the incoming Motion Picture Experts Group (MPEG)-H 3D audio bitstream is decoded by the core decoder that reproduces the encoded waveforms that represent either channel signals, object signals, or higher order ambisonics (HOA) coefficient signals
In MPEG-I, the user can move around in the world created by the media presentation, with head movement or both head movement and body movement in virtual space, where we assume that audio presentation is done via headphones
Summary
The term “immersive audio” is often used to characterize the latest generation of sound systems that aim at providing an audio experience that conveys to the listener the sensation of being fully immersed into or “present” in a surrounding sound scene. While early sound reproduction systems provided stereophonic sound reproduction over two loudspeakers with an illusion of left-right (and depth) perception to the listener for a limited frontal sound field [1], [2], the second generation added a 360◦ “surround” experience that extended the presented sound stage to include both to the extreme left and right, as well as sound from behind the listener by adding more loudspeakers from all horizontal directions (e.g., 5.1 and 7.1 [3]–[5]) This already provides a significant degree of user immersion into the sound field. MPEG-H is described in [11]; it is a foundational technology for MPEG-I audio and, requires some description here in order to make this article understood by the reader
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have