Abstract

Motion estimation (ME) serves as a key tool in a variety of video coding standards. With the increasing need for higher resolution video format, the limited memory bandwidth becomes a bottleneck for ME implementation. The huge data loading from external memory to the on-chip memory and the frequent data fetching from the on-chip memory to the ME engine are two major problems. To reduce both off-chip and on-chip memory bandwidth, we propose a two-layer data reuse scheme. On the macroblock (MB) layer, an advanced Level C data reuse scheme is presented. It employs two cooperating on-chip caches which load data in a novel local-snake scanning manner. On the block layer, we propose a fast variable block size motion estimation with a refined search window (RSW-VBSME). A new approach for hardware implementation of VBSME is then employed based on the fast algorithm. Instead of obtain the SADs of all the modes at the same time, the ME of different block sizes are performed separately. This enables higher data reusability within an MB. The two-layer data reuse scheme archives a more than 90% reduction of off-chip memory bandwidth with a slight increase of on-chip memory size. Moreover, the on-chip memory bandwidth is also greatly reduced compared with other reuse methods with different VBSME implementations.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call