Abstract

Pipelining has been applied in many area to improve system performance by overlapping executions of hardware or software computing stages. However, direct pipelining for H.264 decoding is difficult because video bitstreams are encoded with lots of dependencies and little parallelism is left to be explored. Fortunately, pure software pipelining can still be applied to H.264 decoding at macroblock level with reasonable performance gain. However, the pipeline stages might need to synchronize with each other and incur lots of extra overhead. For optimized decoders, the overhead is relatively more significant and software pipelining might lead to negative performance gain. We first group multiple stages into larger batches and execute these batches concurrently, called batch-pipelining, to explore more parallelism on multicore systems. Experimental results show that it can speed the decoding up to 89% and achieve up to 259 and 69 frames per second for resolution 720P and 1080P, respectively, on a 4-core ×86 machine over an optimized H.264 decoder. Because of its flexibility, batch-pipelining can be applied to not only H.264 but also many similar applications, such as the next-generation video coding: high efficiency video coding. Therefore, we believe the batch-pipelining mechanism creates a new effective direction for software codec development.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call