MPEG Unified Speech and Audio Coding – Bridging the Gap

Markus Multrus,Bernhard Grill,Julien Robilliard,Max Neuendorf,Daniel Fischer,Guillaume Fuchs,Jeremie Lecomte,Frederik Nagel,Stefan Bayer,Johannes Hilpert,Christian Helmrich,Ralf Geiger,Nikolaus Rettelbach,Stephan Wilde,Sascha Disch

doi:10.1007/978-3-642-23071-4_33

Abstract

Speech and audio coding schemes originate from different worlds. Speech coding schemes typically assume a source model i.e. the human vocal tract. General audio coding schemes primarily rely on a sinkmodel i.e. the human auditory system. While speech coding schemes work well for the signal class they were designed for at very low rates, they are known to fail for general audio signals even at higher rates. In contrast, general audio coders work well for any content at higher rates, but typically have limited performance especially for speech signals at very low rates. Recently the ISO/MPEG group started a standardization activity to develop a new Unified Speech and Audio Coding scheme. A state of the art AAC based general audio coder, featuring transform coding, parametric bandwidth extension and parametric stereo coding,was extended by source model coding tools. All codec modules were further improved and revised for enhanced performance in particular at very low bitrates. The new unified coding scheme outperforms dedicated speech and general audio coding schemes and bridges the gap between both worlds. This paper describes the new codec in detail and shows how the goal of consistent high quality for all signal types is reached.

Full Text