In this paper, an efficient VLSI architecture of a hierarchical block matching algorithm has been proposed for motion estimation. At the lowest resolution level, two motion vector (MV) candidates are selected to get better performance. In the next search level, these two candidates provide the center points for local searches to get one MV candidate. Then, at next level and the finest level, one MV candidate is chosen from one local search area (LSA), defined by the MV candidate, obtained from lower resolution level. This architecture requires nine processing elements and data are processed in such a way that calculation to obtain frames of different resolution is overlapped with the MV calculation. Simulation results indicate that this architecture is more area-efficient and faster than many full-search, three-step-search and multiresolution architectures which makes it suitable for SD and HD video. To avoid the delay due to pipelining, the MVs of all the macro-blocks are calculated for one resolution level and stored in RAM to get LSA for next resolution level. This architecture with about 16 K gates is implemented for a search range of [?15, +15]. As this architecture requires only two-port memory, which is very common in most consumer electronics systems, it can be integrated easily in any existing system at the expense of a very small area.
Read full abstract