Abstract

Extensive efforts have been made to design hardware-based integer motion estimation (IME) that is much faster than software-based IME but suffers from the degradation in the coding efficiency. This is because the strategy for previous efforts was a simple algorithmic modification of the fast IME to facilitate the given hardware design at the expense of coding efficiency. This paper proposes a novel hardware design of the IME that not only offers real-time processing capability but also provides a flexible tradeoff between computational complexity and coding efficiency. First, a prediction unit (PU) loop unrolling scheme is proposed to solve the pipeline stall problem owing to the nature of fast IME algorithms such as the test zone search (TZS). It reduces idle cycles by 89.24%. Next, to further reduce the computational complexity of the TZS algorithm, a computational redundancy among PUs within a coding unit is reduced through a search step synchronization and search point sharing scheme. Thus, the computational complexity is reduced by 72.25%. The proposed schemes eliminate the inefficiency of hardware design; thus, they do not suffer from serious degradation in the coding efficiency. Consequently, the proposed hardware-based IME processes $7680\times4320$ videos at 30 frames per second while increasing the Bjontegaard delta bitrate by only 0.90% on average. The hardware design is synthesized using a 65 nm general purpose CMOS technology, and its gate count is 268.5K at an operating clock frequency of 500 MHz.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call